Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-18 Thread pandees waran
All, May I know what exactly changed in 2.1.1 which solved this problem? Sent from my iPhone > On Sep 17, 2017, at 11:08 PM, Anastasios Zouzias wrote: > > Hi, > > I had a similar issue using 2.1.0 but not with Kafka. Updating to 2.1.1 > solved my issue. Can you try with

Re: Spark parquet file read problem !

2017-07-30 Thread pandees waran
I have encountered the similar error when the schema / datatypes are conflicting in those 2 parquet files. Are you sure that the 2 individual files are in the same structure with similar datatypes. If not you have to fix this by enforcing the default values for the missing values to make the

Re: How do I access the nested field in a dataframe, spark Streaming app... Please help.

2016-11-20 Thread pandees waran
have you tried using "." access method? e.g: ds1.select("name","addresses[0].element.city") On Sun, Nov 20, 2016 at 9:59 AM, shyla deshpande wrote: > The following my dataframe schema > > root > |-- name: string (nullable = true) > |-- addresses: array

Recommended way to run spark streaming in production in EMR

2016-10-11 Thread pandees waran
All, We have an use case in which 2 spark streaming jobs in same EMR cluster. I am thinking of allowing multiple streaming contexts and run them as 2 separate spark-submit with wait for app completion set to false. With this, the failure detection and monitoring seems obscure and doesn't seem

Re: Is "spark streaming" streaming or mini-batch?

2016-08-23 Thread pandees waran
It's based on "micro batching" model. Sent from my iPhone > On Aug 23, 2016, at 8:41 AM, Aseem Bansal wrote: > > I was reading this article https://www.inovex.de/blog/storm-in-a-teacup/ and > it mentioned that spark streaming actually mini-batch not actual streaming. >

Processing ion formatted messages in spark

2016-07-11 Thread pandees waran
All, did anyone ever work on processing Ion formatted messages in Spark? Ion format is superset of JSON. All JSONs are valid IONs, but the reverse is not true. For more details on Ion; http://amznlabs.github.io/ion-docs/ Thanks.

Re: spark streaming questions

2016-06-22 Thread pandees waran
For my question (2), From my understanding checkpointing ensures the recovery from failures. Sent from my iPhone > On Jun 22, 2016, at 10:27 AM, pandees waran <pande...@gmail.com> wrote: > > In general, if you have multiple steps in a workflow : > For every batch > 1.str

Re: spark streaming questions

2016-06-22 Thread pandees waran
about your > use case. > > Cheers, > > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > >> On 22 June 2016 at 15:54, p

spark streaming questions

2016-06-22 Thread pandees waran
Hello all, I have few questions regarding spark streaming : * I am wondering anyone uses spark streaming with workflow orchestrators such as data pipeline/SWF/any other framework. Is there any advantages /drawbacks on using a workflow orchestrator for spark streaming? *How do you guys manage

identifying newly arrived files in s3 in spark streaming

2016-06-06 Thread pandees waran
I am fairly new to spark streaming and i have a basic question on how spark streaming works on s3 bucket which is periodically getting new files once in 10 mins. When i use spark streaming to process these files in this s3 path, will it process all the files in this path (old+new files) every

Re: Spark Language / Data Base Question

2015-06-25 Thread pandees waran
There’s no one best for these questions. The question can be more refined with a specific use case and for that which is the best data store. On Jun 25, 2015, at 12:02 AM, Sinha, Ujjawal (SFO-MAP) ujjawal.si...@cadreon.com wrote: Hi Guys I am very new for spark , I have 2 question

Equivalent functions for NVL() and CASE expressions in Spark SQL

2014-07-17 Thread pandees waran
Do we have any equivalent scala functions available for NVL() and CASE expressions to use in spark sql?

Read all the columns from a file in spark sql

2014-07-16 Thread pandees waran
Hi, I am newbie to spark sql and i would like to know about how to read all the columns from a file in spark sql. I have referred the programming guide here: http://people.apache.org/~tdas/spark-1.0-docs/sql-programming-guide.html The example says: val people =