Hi,
I have recently enable log4j.rootCategory=WARN, console in spark
configuration. but after that spark.logConf=True has becomes ineffective.
So just want to confirm if this is because log4j.rootCategory=WARN ?
Thanks
--
View this message in context:
Hi,
I'm getting a ClassNotFoundException at the executor when trying to
register a class for Kryo serialization:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
Hi guys,
Trying to use the SparkSQL Thriftserver with hive metastore. It seems that
hive meta impersonation works fine (when running Hive tasks). However
spinning up SparkSQL thrift server, impersonation doesn't seem to work...
What settings do I need to enable impersonation?
I've copied the
After repartitioning a DataFrame in Spark 1.3.0 I get a .parquet exception
when saving toAmazon's S3. The data that I try to write is 10G.
logsForDate
.repartition(10)
.saveAsParquetFile(destination) // -- Exception here
The exception I receive is:
java.io.IOException: The file being
You have rdd or dataframe? Rdds are kind of tuples. You can add a new
column to it by a map.
rdd s are immutable, so you will get another rdd.
On 1 May 2015 14:59, Carter gyz...@hotmail.com wrote:
Hi all,
I have a RDD with *MANY *columns (e.g., *hundreds*), how do I add one more
column at the
OK, consider the case where there are multiple event triggers for a given
customer/ vendor/product like 1,1,2,2,3 arranged in the order of *event*
*occurrence* (time stamp). So output should be two groups (1,2) and
(1,2,3). The doublet would be first occurrence of 1,2 and triplet later
occurrences
Link to the question:
http://stackoverflow.com/questions/29974017/spark-kafka-producer-not-serializable-exception
Thanks for any pointers.
bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow
So the above class is in the jar which was in the classpath ?
Can you tell us a bit more about Schema$MyRow ?
On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote:
Hi,
I'm getting a
Yes, this class is present in the jar that was loaded in the classpath
of the executor Java process -- it wasn't even lazily added as a part
of the task execution. Schema$MyRow is a protobuf-generated class.
After doing some digging around, I think I might be hitting up against
SPARK-5470, the
I cherry-picked the fix for SPARK-5470 and the problem has gone away.
On Fri, May 1, 2015 at 9:15 AM, Akshat Aranya aara...@gmail.com wrote:
Yes, this class is present in the jar that was loaded in the classpath
of the executor Java process -- it wasn't even lazily added as a part
of the task
I have the real DEBS-TAxi data in csv file , in order to operate over it
how to simulate a Spout kind of thing as event generator using the
timestamps in CSV file.
--
Thanks Regards,
Anshu Shukla
Hi,
Maybe you could use streamingContext.fileStream like in the example from
https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers,
you can read from files on any file system compatible with the HDFS API
(that is, HDFS, S3, NFS, etc.). You could split
Oops! well spotted. Many thanks Shixiong.
On Fri, May 1, 2015 at 1:25 AM, Shixiong Zhu zsxw...@gmail.com wrote:
spark.history.fs.logDirectory is for the history server. For Spark
applications, they should use spark.eventLog.dir. Since you commented out
spark.eventLog.dir, it will be
Hi,
We've had a similar problem, but with log4j properties file.
The only working way we've found, was externally deploying the properties
file on the worker machine to the spark conf folder and configuring the
executor jvm options with:
sparkConf.set(spark.executor.extraJavaOptions,
Hi Stephen,
It looks like Mesos slave was most likely not able to launch some mesos
helper processes (fetcher probably?).
How did you install Mesos? Did you build from source yourself?
Please install Mesos through a package or actually from source run make
install and run from the installed
Hi everyone,
I have a spark application that works fine on a standalone Spark cluster
that runs on my laptop
(master and one worker), but fails when I try to run in on a standalone
Spark cluster
deployed on EC2 (master and worker are on different machines).
The application structure goes in the
Hi all,
I encountered strange behavior with the driver memory setting, and was
wondering if some of you experienced it as well, or know what the problem
is.
I want to start a Spark job in the background with spark-submit. If I have
the driver memory setting in my spark-defaults.conf:
In all the examples, it seems that the spark application doesn't really do
anything special in order to exit. When I run my application, however, the
spark-submit script just hangs there at the end. Is there something
special I need to do to get that thing to exit normally?
I have this spark app that simply needs to do a simple regular join between
two datasets. IT works fine with tiny data set (2.5G input of each
dataset). When i run against 25G of each input and with .partitionBy(new
org.apache.spark.HashPartitioner(200)) , I see NullPointerExveption
this trace
*Resending as I do not see that this made it to the mailing list, sorry if
in fact it did an is just nor reflected online yet.*
I’m very perplexed with the following. I have a set of AVRO generated
objects that are sent to a SparkStreaming job via Kafka. The SparkStreaming
job follows the
20 matches
Mail list logo