Re: Spark Streaming fails with unable to get records after polling for 512 ms

2017-11-15 Thread jagadish kagitala
Hi Cody, It worked, after moving the parameter to sparkConf. I don't see that error. But, Now i'm seeing the count for each RDD returns 0. But, there are records in the topic i'm reading. Do you see anything wrong with how i'm creating the Direct Stream ? Thanks Jagadish On Wed, Nov 15, 2017

Access to Applications metrics

2017-11-15 Thread Nick Dimiduk
Hello, I'm wondering if it's possible to get access to the detailed job/stage/task level metrics via the metrics system (JMX, Graphite, ). I've enabled the wildcard sink and I do not see them. It seems these values are only available over http/json and to SparkListener instances, is this the

[Spark Core]: S3a with Openstack swift object storage not using credentials provided in sparkConf

2017-11-15 Thread Marius
Hey, i am currently using Spark 2.2.0 for Hadoop 2.7.x in in a Standalone cluster for testing. I want to Access some files to share them one the nodes on the cluster using addFiles. As local directories are not supported for this i want to use s3 to do the job. In contrast to nearly

Parquet files from spark not readable in Cascading

2017-11-15 Thread Vikas Gandham
Hi, When I tried reading parquet data that was generated by spark in cascading it throws following error Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file "" at

Re: Spark Streaming fails with unable to get records after polling for 512 ms

2017-11-15 Thread Cody Koeninger
spark.streaming.kafka.consumer.poll.ms is a spark configuration, not a kafka parameter. see http://spark.apache.org/docs/latest/configuration.html On Tue, Nov 14, 2017 at 8:56 PM, jkagitala wrote: > Hi, > > I'm trying to add spark-streaming to our kafka topic. But, I keep

Re: Process large JSON file without causing OOM

2017-11-15 Thread Alec Swan
Thanks Steve and Vadim for the feedback. @Steve, are you suggesting creating a custom receiver and somehow piping it through Spark Streaming/Spark SQL? Or are you suggesting creating smaller datasets from the stream and using my original code to process smaller datasets? It'd be very helpful for

Restart Spark Streaming after deployment

2017-11-15 Thread KhajaAsmath Mohammed
Hi, I am new in the usage of spark streaming. I have developed one spark streaming job which runs every 30 minutes with checkpointing directory. I have to implement minor change, shall I kill the spark streaming job once the batch is completed using yarn application -kill command and update the

Re: Process large JSON file without causing OOM

2017-11-15 Thread Vadim Semenov
There's a lot of off-heap memory involved in decompressing Snappy, compressing ZLib. Since you're running using `local[*]`, you process multiple tasks simultaneously, so they all might consume memory. I don't think that increasing heap will help, since it looks like you're hitting system memory

Re: Process large JSON file without causing OOM

2017-11-15 Thread Steve Loughran
On 14 Nov 2017, at 15:32, Alec Swan > wrote: But I wonder if there is a way to stream/batch the content of JSON file in order to convert it to ORC piecemeal and avoid reading the whole JSON file in memory in the first place? That is what

spark strucured csv file stream not detecting new files

2017-11-15 Thread Imran Rajjad
Greetings, I am running a unit test designed to stream a folder where I am manually copying csv files. The files do not always get picked up. They only get detected when the job starts with the files already in the folder. I even tried using the option of fileNameOnly newly included in 2.2.0.