Hi Cody,
It worked, after moving the parameter to sparkConf. I don't see that error.
But, Now i'm seeing the count for each RDD returns 0. But, there are
records in the topic i'm reading.
Do you see anything wrong with how i'm creating the Direct Stream ?
Thanks
Jagadish
On Wed, Nov 15, 2017
Hello,
I'm wondering if it's possible to get access to the detailed job/stage/task
level metrics via the metrics system (JMX, Graphite, ). I've enabled the
wildcard sink and I do not see them. It seems these values are only
available over http/json and to SparkListener instances, is this the
Hey,
i am currently using Spark 2.2.0 for Hadoop 2.7.x in in a Standalone
cluster for testing. I want to Access some files to share them one the
nodes on the cluster using addFiles. As local directories are not
supported for this i want to use s3 to do the job.
In contrast to nearly
Hi,
When I tried reading parquet data that was generated by spark in cascading
it throws following error
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read
value at 0 in block -1 in file ""
at
spark.streaming.kafka.consumer.poll.ms is a spark configuration, not
a kafka parameter.
see http://spark.apache.org/docs/latest/configuration.html
On Tue, Nov 14, 2017 at 8:56 PM, jkagitala wrote:
> Hi,
>
> I'm trying to add spark-streaming to our kafka topic. But, I keep
Thanks Steve and Vadim for the feedback.
@Steve, are you suggesting creating a custom receiver and somehow piping it
through Spark Streaming/Spark SQL? Or are you suggesting creating smaller
datasets from the stream and using my original code to process smaller
datasets? It'd be very helpful for
Hi,
I am new in the usage of spark streaming. I have developed one spark
streaming job which runs every 30 minutes with checkpointing directory.
I have to implement minor change, shall I kill the spark streaming job once
the batch is completed using yarn application -kill command and update the
There's a lot of off-heap memory involved in decompressing Snappy,
compressing ZLib.
Since you're running using `local[*]`, you process multiple tasks
simultaneously, so they all might consume memory.
I don't think that increasing heap will help, since it looks like you're
hitting system memory
On 14 Nov 2017, at 15:32, Alec Swan
> wrote:
But I wonder if there is a way to stream/batch the content of JSON file in
order to convert it to ORC piecemeal and avoid reading the whole JSON file in
memory in the first place?
That is what
Greetings,
I am running a unit test designed to stream a folder where I am manually
copying csv files. The files do not always get picked up. They only get
detected when the job starts with the files already in the folder.
I even tried using the option of fileNameOnly newly included in 2.2.0.
10 matches
Mail list logo