That is a good question. Names with `.` in them are in particular broken
by SPARK-5632 https://issues.apache.org/jira/browse/SPARK-5632, which I'd
like to fix.
There is a more general question of whether strings that are passed to
DataFrames should be treated as quoted identifiers (i.e. `as
Hi Gerard,
thanks for the hint with the Singleton object. Seems very interesting.
However, when my singleton object (e.g. handle to my DB) is supposed to
have a member variable that is non-serializable I again will have a
problem, won’t I? At least I always run into issues that Python tries to
Hi,
I had to setup a cron job for cleanup in $SPARK_HOME/work and in
$SPARK_LOCAL_DIRS.
Here are the cron lines. Unfortunately it's for *nix machines, I guess
you will have to adapt it seriously for Windows.
12 * * * * find $SPARK_HOME/work -cmin +1440 -prune -exec rm -rf {} \+
32 * * * *
Unfortunately the spark-hive-thriftserver hasn't been published yet, you
may either publish it locally or use it as an unmanaged SBT dependency.
On 4/8/15 8:58 AM, Mohammed Guller wrote:
Hi –
I want to create an instance of HiveThriftServer2 in my Scala
application, so I imported the
On 10 Apr 2015, at 13:40, Lorenz Knies m...@l1024.org wrote:
i would consider it a bug, that the Yarn application state monitor” thread
dies on an, i think even expected (at least in the java methods called
further down the stack), exception.
What do you think? Is it a problem, that we
I have 3 transformation and then I am running for each job is going
Process is going in NODE_LOCAL level and no executor in waiting for long
time
no task is running.
Regarding
Jeetendra
Thanks, Cheng.
BTW, there is another thread on the same topic. It looks like the thrift-server
will be published for 1.3.1.
Mohammed
From: Cheng Lian [mailto:lian.cs@gmail.com]
Sent: Saturday, April 11, 2015 5:37 AM
To: Mohammed Guller; user@spark.apache.org
Subject: Re: HiveThriftServer2
(Adding spark user list)
Hi Tom,
If I understand correctly you're saying that you're running into memory
problems because the scheduler is allocating too much CPUs and not enough
memory to acoomodate them right?
In the case of fine grain mode I don't think that's a problem since we have
a fixed
The read seem to be successfully as the values for each field in record are
different and correct. The problem is when i collect it or trigger next
processing (join with other table) , each of this probably triggers
serialization and thats when all the fields in the record get the value of
first
Have you created a class called SQLContextSingleton ? If so, is it in the
compile class path?
On Fri, Apr 10, 2015 at 6:47 AM, Mukund Ranjan (muranjan)
muran...@cisco.com wrote:
Hi All,
Any idea why I am getting this error?
wordsTenSeconds.foreachRDD((rdd: RDD[String], time: Time)
We have very large processing being done on Hadoop (400 M/r Jobs, 1 Day
duration, 100s of TB data, 100s of joins). We are exploring Spark as
alternative to speed up our processing time. We use Scala + Scoobie today
and Avro is the data format across steps.
I observed a strange behavior, i read
Your first DDL should be correct (as long as the JDBC URL is correct).
The string after USING should be the data source name
(org.apache.spark.sql.jdbc or simply jdbc).
The SQLException here indicates that Spark SQL couldn't find SQL Server
JDBC driver in the classpath.
As what Denny said,
What do you mean by rules? Spark SQL optimization rules? Currently
these are entirely private to Spark SQL, and are not configurable during
runtime.
Cheng
On 4/10/15 2:55 PM, Bruce Dou wrote:
Hi,
How to manage the life cycle of spark sql and rules applying on the
data stream. Enabling or
One possible approach can be defining a UDT (user-defined type) for Joda
time. A UDT maps an arbitrary type to and from Spark SQL data types. You
may check the ExamplePointUDT [1] for more details.
[1]:
I have two RDD
leftRDD = RDD[(Long, (DetailInputRecord, VISummary, Long))]
and
rightRDD =
RDD[(Long, com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum)
DetailInputRecord is a object that contains (guid, sessionKey,
sessionStartDAte, siteID)
There are 10 records in leftRDD
I took that RDD run through it and printed 4 elements from it, they all
printed correctly.
val x = viEvents.map {
case (itemId, event) =
println(event.get(guid), itemId, event.get(itemId),
event.get(siteId))
(itemId, event)
}
The above code prints
16 matches
Mail list logo