Re: share datasets across multiple spark-streaming applications for lookup

2017-10-31 Thread Joseph Pride
Folks: SnappyData. I’m fairly new to working with it myself, but it looks pretty promising. It marries Spark with a co-located in-memory GemFire (or something gem-related) database. So you can access the data with SQL, JDBC, ODBC (if you wanna go Enterprise instead of open-source) or natively

Re: share datasets across multiple spark-streaming applications for lookup

2017-10-31 Thread Gene Pang
Hi, Alluxio enables sharing dataframes across different applications. This blog post talks about dataframes and Alluxio, and this Spark Summit presentation

Re: share datasets across multiple spark-streaming applications for lookup

2017-10-31 Thread Revin Chalil
Any info on the below will be really appreciated. I read about Alluxio and Ignite. Has anybody used any of them? Do they work well with multiple Apps doing lookups simultaneously? Are there better options? Thank you. From: roshan joe Date: Monday, October 30, 2017 at

Fwd: Regarding column partitioning IDs and names as per hierarchical level SparkSQL

2017-10-31 Thread Aakash Basu
Hey all, Any help in the below please? Thanks, Aakash. -- Forwarded message -- From: Aakash Basu Date: Tue, Oct 31, 2017 at 9:17 PM Subject: Regarding column partitioning IDs and names as per hierarchical level SparkSQL To: user

Regarding column partitioning IDs and names as per hierarchical level SparkSQL

2017-10-31 Thread Aakash Basu
Hi all, I have to generate a table with Spark-SQL with the following columns - Level One Id: VARCHAR(20) NULL Level One Name: VARCHAR( 50) NOT NULL Level Two Id: VARCHAR( 20) NULL Level Two Name: VARCHAR(50) NULL Level Thr ee Id: VARCHAR(20) NULL Level Thr ee Name: VARCHAR(50) NULL Level Four

Re: Spark job's application tracking URL not accessible from docker container

2017-10-31 Thread Harsh
Hi I am facing the same issue while launching the application inside docker container. Kind Regards Harsh -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Bucket vs repartition

2017-10-31 Thread אורן שמון
Hi all, I have 2 spark jobs one is pre-process and the second is the process. Process job needs to calculate for each user in the data. I want to avoid shuffle like groupBy so I think about to save the result of the pre-process as bucket by user in Parquet or to re-partition by user and save the

Read parquet files as buckets

2017-10-31 Thread אורן שמון
Hi all, I have Parquet files as result from some job , the job saved them in bucket mode by userId . How can I read the files in bucket mode in another job ? I tried to read it but it didnt bucket the data (same user in same partition)

Hi all,

2017-10-31 Thread אורן שמון
I have 2 spark jobs one is pre-process and the second is the process. Process job needs to calculate for each user in the data. I want to avoid shuffle like groupBy so I think about to save the result of the pre-process as bucket by user in Parquet or to re-partition by user and save the result .

Spark job's application tracking URL not accessible from docker container

2017-10-31 Thread Divya Narayan
We have streaming jobs and batch jobs running inside the docker containers with spark driver launched within the container Now when we open the Resource manager UI http://:8080, and try to access the application tracking URL of any running job, the page times out with error: HTTP ERROR 500