Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Vishnu Viswanath
Thank you Yanbo, It looks like this is available in 1.6 version only. Can you tell me how/when can I download version 1.6? Thanks and Regards, Vishnu Viswanath, On Wed, Dec 2, 2015 at 4:37 AM, Yanbo Liang wrote: > You can set "handleInvalid" to "skip" which help you skip

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger
Again, just to be clear, silently throwing away data because your system isn't working right is not the same as "recover from any Kafka leader changes and offset out of ranges issue". On Tue, Dec 1, 2015 at 11:27 PM, Dibyendu Bhattacharya < dibyendu.bhattach...@gmail.com> wrote: > Hi, if you

Re: Jupyter configuration

2015-12-02 Thread Don Drake
Here's what I set in a shell script to start the notebook: export PYSPARK_PYTHON=~/anaconda/bin/python export PYSPARK_DRIVER_PYTHON=~/anaconda/bin/ipython export PYSPARK_DRIVER_PYTHON_OPTS='notebook' If you want to use HiveContext w/CDH: export HADOOP_CONF_DIR=/etc/hive/conf Then just run

Retrieving the PCA parameters in pyspark

2015-12-02 Thread Rohit Girdhar
Hi I'm using PCA through the python interface for spark, as per the instructions on this page: https://spark.apache.org/docs/1.5.1/ml-features.html#pca It works fine for learning the parameters and transforming the data. However, I'm unable to find a way to retrieve the learnt PCA parameters. I

create DataFrame from RDD

2015-12-02 Thread Zsolt Tóth
Hi, I have a Spark job with many transformations (sequence of maps and mapPartitions) and only one action in the end (DataFrame.write()). The transformations return an RDD, so I need to create a DataFrame. To be able to use sqlContext.createDataFrame() I need to know the schema of the Row but for

Re: Spark Streaming and JMS

2015-12-02 Thread SamyaMaiti
Hi All, Is there any Pub-Sub for JMS provided by Spark out of box like Kafka? Thanks. Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-JMS-tp5371p25548.html Sent from the Apache Spark User List mailing list archive at

default parallelism and mesos executors

2015-12-02 Thread Adrian Bridgett
Using parallelize() on a dataset I'm only seeing two tasks rather than the number of cores in the Mesos cluster. This is with spark 1.5.1 and using the mesos coarse grained scheduler. Running pyspark in a console seems to show that it's taking a while before the mesos executors come online

Re: spark-ec2 vs. EMR

2015-12-02 Thread Alexander Pivovarov
Do you think it's a security issue if EMR started in VPC with a subnet having Auto-assign Public IP: Yes you can remove all Inbound rules having 0.0.0.0/0 Source in master and slave Security Group So, master and slave boxes will be accessible only for users who are on VPN On Wed, Dec 2, 2015

Re: ClassLoader resources on executor

2015-12-02 Thread Marcelo Vanzin
On Tue, Dec 1, 2015 at 12:45 PM, Charles Allen wrote: > Is there a way to pass configuration file resources to be resolvable through > the classloader? Not in general. If you're using YARN, you can cheat and use "spark.yarn.dist.files" which will place those files

Re: spark-ec2 vs. EMR

2015-12-02 Thread Dana Powers
EMR was a pain to configure on a private VPC last I tried. Has anyone had success with that? I found spark-ec2 easier to use w private networking, but also agree that I would use for prod. -Dana On Dec 1, 2015 12:29 PM, "Alexander Pivovarov" wrote: > 1. Emr 4.2.0 has

Spark Streaming Use Cases

2015-12-02 Thread Priya Ch
Hi All, I have the following use case for Spark Streaming - There are 2 streams of data say - FlightBookings and Ticket For each ticket, I need to associate it with relevant Booking info. There are distinct applications for Booking and Ticket. The Booking streaming application processes the

Re: SparkSQL API to insert DataFrame into a static partition?

2015-12-02 Thread Michael Armbrust
you might also coalesce to 1 (or some small number) before writing to avoid creating a lot of files in that partition if you know that there is not a ton of data. On Wed, Dec 2, 2015 at 12:59 AM, Rishi Mishra wrote: > As long as all your data is being inserted by Spark ,

Re: possible bug spark/python/pyspark/rdd.py portable_hash()

2015-12-02 Thread Andy Davidson
Hi Ted an Felix From: Ted Yu Date: Sunday, November 29, 2015 at 10:37 AM To: Andrew Davidson Cc: Felix Cheung , "user @spark" Subject: Re: possible bug spark/python/pyspark/rdd.py

Re: spark-ec2 vs. EMR

2015-12-02 Thread Jerry Lam
Hi Dana, Yes, we get VPC + EMR working but I'm not the person who deploys it. It is related to subnet as Alex points out. Just to want to add another point, spark-ec2 is nice to keep and improve because it allows users to any version of spark (nightly-build for example). EMR does not allow you

Re: Question about yarn-cluster mode and spark.driver.allowMultipleContexts

2015-12-02 Thread Marcelo Vanzin
On Tue, Dec 1, 2015 at 9:43 PM, Anfernee Xu wrote: > But I have a single server(JVM) that is creating SparkContext, are you > saying Spark supports multiple SparkContext in the same JVM? Could you > please clarify on this? I'm confused. Nothing you said so far requires

Re: Kafka - streaming from multiple topics

2015-12-02 Thread Cody Koeninger
Use the direct stream. You can put multiple topics in a single stream, and differentiate them on a per-partition basis using the offset range. On Wed, Dec 2, 2015 at 2:13 PM, dutrow wrote: > I found the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2388 > > It

Re: Kafka - streaming from multiple topics

2015-12-02 Thread dutrow
My need is similar; I have 10+ topics and don't want to dedicate 10 cores to processing all of them. Like yourself and others, the (String, String) pair that comes out of the DStream has (null, StringData...) values instead of (topic name, StringData...) Did anyone ever find a way around this

Re: Kafka - streaming from multiple topics

2015-12-02 Thread Dan Dutrow
Sigh... I want to use the direct stream and have recently brought in Redis to persist the offsets, but I really like and need to have realtime metrics on the GUI, so I'm hoping to have Direct and Receiver stream both working. On Wed, Dec 2, 2015 at 3:17 PM Cody Koeninger

Re: Kafka - streaming from multiple topics

2015-12-02 Thread dutrow
I found the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2388 It was marked as invalid. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-streaming-from-multiple-topics-tp8678p25550.html Sent from the Apache Spark User List mailing list

ML - LinearRegression: is this a bug ????

2015-12-02 Thread Saif.A.Ellafi
Data: +---++ | label|features| +---++ |0.13271745268556925|[-0.2006809895664...| |0.23956421080605234|[-0.0938342314459...| |0.47464690691431843|[0.14124846466227...| |

Re: ClassLoader resources on executor

2015-12-02 Thread Charles Allen
I still have to propagate the file into the directory somehow, and also that's marked as only for legacy jobs (deprecated?), so no, I have not experimented with it yet. On Wed, Dec 2, 2015 at 12:53 AM Rishi Mishra wrote: > Did you try to use

Re: SparkSQL API to insert DataFrame into a static partition?

2015-12-02 Thread Rishi Mishra
As long as all your data is being inserted by Spark , hence using the same hash partitioner, what Fengdong mentioned should work. On Wed, Dec 2, 2015 at 9:32 AM, Fengdong Yu wrote: > Hi > you can try: > > if your table under location “/test/table/“ on HDFS > and has

Re: sparkSQL Load multiple tables

2015-12-02 Thread Jeff Zhang
Do you want to load multiple tables by using sql ? JdbcRelation now only can load single table. It doesn't accept sql as loading command. On Wed, Dec 2, 2015 at 4:33 PM, censj wrote: > hi Fengdong Yu: > I want to use sqlContext.read.format('jdbc').options( ... ).load()

RE: starting spark-shell throws /tmp/hive on HDFS should be writable error

2015-12-02 Thread Lin, Hao
Mich, did you run this locally or on EC2 (I use EC2)? Is this problem universal or specific to, say EC2? Many thanks From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: Wednesday, December 02, 2015 5:01 PM To: Lin, Hao; user@spark.apache.org Subject: RE: starting spark-shell throws

RE: starting spark-shell throws /tmp/hive on HDFS should be writable error

2015-12-02 Thread Lin, Hao
I actually don't have the folder /tmp/hive created in my master node, is that a problem? From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: Wednesday, December 02, 2015 5:40 PM To: Lin, Hao; user@spark.apache.org Subject: RE: starting spark-shell throws /tmp/hive on HDFS should be writable

Re: Spark Streaming 1.6 accumulating state across batches for joins

2015-12-02 Thread Aris
Please disregard the "window" functions...it turns out that was development code. Everything else is correct. val rawLEFT: DStream[String] = ssc.textFileStream(dirLEFT). window(Seconds(30)) val rawRIGHT: DStream[String] = ssc.textFileStream(dirRIGHT). window(Seconds(30)) should be val

Spark Streaming from S3

2015-12-02 Thread Michele Freschi
Hi all, I have an app streaming from s3 (textFileStream) and recently I've observed increasing delay and long time to list files: INFO dstream.FileInputDStream: Finding new files took 394160 ms ... INFO scheduler.JobScheduler: Total delay: 404.796 s for time 144910020 ms (execution: 10.154

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Dibyendu Bhattacharya
This consumer which I mentioned does not silently throw away data. If offset out of range it start for earliest offset and that is correct way of recovery from this error. Dibyendu On Dec 2, 2015 9:56 PM, "Cody Koeninger" wrote: > Again, just to be clear, silently throwing

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Cheng Lian
You may try to set Hadoop conf "parquet.enable.summary-metadata" to false to disable writing Parquet summary files (_metadata and _common_metadata). By default Parquet writes the summary files by collecting footers of all part-files in the dataset while committing the job. Spark also follows

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger
I believe that what differentiates reliable systems is individual components should fail fast when their preconditions aren't met, and other components should be responsible for monitoring them. If a user of the direct stream thinks that your approach of restarting and ignoring data loss is the

Re: spark sql cli query results written to file ?

2015-12-02 Thread Sahil Sareen
Yeah, Thats the example from the link I just posted. -Sahil On Thu, Dec 3, 2015 at 11:41 AM, Akhil Das wrote: > Something like this? > > val df = > sqlContext.read.load("examples/src/main/resources/users.parquet")df.select("name", >

LDA topic modeling and Spark

2015-12-02 Thread Nguyen, Tiffany T
Hello, I have been trying to understand the LDA topic modeling example provided here: https://spark.apache.org/docs/latest/mllib-clustering.html#latent-dirichlet-allocation-lda. In the example, they load word count vectors from a text file that contains these word counts and then they output

Re: spark sql cli query results written to file ?

2015-12-02 Thread Akhil Das
Something like this? val df = sqlContext.read.load("examples/src/main/resources/users.parquet")df.select("name", "favorite_color").write.save("namesAndFavColors.parquet") It will save the name, favorite_color columns to a parquet file. You can read more information over here

Re: send this email to unsubscribe

2015-12-02 Thread Akhil Das
If you haven't unsubscribed already, shoot an email to user-unsubscr...@spark.apache.org Also read more here http://spark.apache.org/community.html Thanks Best Regards On Thu, Nov 26, 2015 at 7:51 AM, ngocan211 . wrote: > >

Re: Error in block pushing thread puts the KinesisReceiver in a stuck state

2015-12-02 Thread Akhil Das
Did you go through the executor logs completely? Futures timed out exception can occur mostly when one of the task/job spend way too much time and fails to respond, this happens when there's a GC pause or memory overhead. Thanks Best Regards On Tue, Dec 1, 2015 at 12:09 AM, Spark Newbie

Re: Debug Spark

2015-12-02 Thread Masf
This is very intersting. Thanks!!! On Thu, Dec 3, 2015 at 8:28 AM, Sudhanshu Janghel < sudhanshu.jang...@cloudwick.com> wrote: > Hi, > > Here is a doc that I had created for my team. This has steps along with > snapshots of how to setup debugging in spark using IntelliJ locally. > > >

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger
No, silently restarting from the earliest offset in the case of offset out of range exceptions during a streaming job is not the "correct way of recovery". If you do that, your users will be losing data without knowing why. It's more like a "way of ignoring the problem without actually

Re: Improve saveAsTextFile performance

2015-12-02 Thread Sahil Sareen
PTAL: http://stackoverflow.com/questions/29213404/how-to-split-an-rdd-into-multiple-smaller-rdds-given-a-max-number-of-rows-per -Sahil On Thu, Dec 3, 2015 at 9:18 AM, Ram VISWANADHA < ram.viswana...@dailymotion.com> wrote: > Yes. That did not help. > > Best Regards, > Ram > From: Ted Yu

Re: spark sql cli query results written to file ?

2015-12-02 Thread Sahil Sareen
Did you see: http://spark.apache.org/docs/latest/sql-programming-guide.html -Sahil On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.com wrote: > HI, > How could I save the spark sql cli running queries results and write the > results to some local file ? > Is there any

Re: Multiplication on decimals in a dataframe query

2015-12-02 Thread Sahil Sareen
+1 looks like a bug I think referencing trades() twice in multiplication is broken, scala> trades.select(trades("quantity")*trades("quantity")).show +-+ |(quantity * quantity)| +-+ | null| | null| scala>

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Adrien Mogenet
Very interested in that topic too, thanks Cheng for the direction! We'll give it a try as well. On 3 December 2015 at 01:40, Cheng Lian wrote: > You may try to set Hadoop conf "parquet.enable.summary-metadata" to false > to disable writing Parquet summary files

Re: Improve saveAsTextFile performance

2015-12-02 Thread Ram VISWANADHA
Yes. That did not help. Best Regards, Ram From: Ted Yu > Date: Wednesday, December 2, 2015 at 3:25 PM To: Ram VISWANADHA > Cc: user

Re: how to skip headers when reading multiple files

2015-12-02 Thread Jeff Zhang
Are you read csv file ? If so you can use spark-csv which support skip header http://spark-packages.org/package/databricks/spark-csv On Thu, Dec 3, 2015 at 10:52 AM, Divya Gehlot wrote: > Hi, > I am new bee to Spark and Scala . > As one of my requirement to read and

Re: spark sql cli query results written to file ?

2015-12-02 Thread Akhil Das
Oops 3 mins late. :) Thanks Best Regards On Thu, Dec 3, 2015 at 11:49 AM, Sahil Sareen wrote: > Yeah, Thats the example from the link I just posted. > > -Sahil > > On Thu, Dec 3, 2015 at 11:41 AM, Akhil Das > wrote: > >> Something like this? >>

Re: how to skip headers when reading multiple files

2015-12-02 Thread Sahil Sareen
You could use "filter" to eliminate headers from your text file RDD while going over each line. -Sahil On Thu, Dec 3, 2015 at 9:37 AM, Jeff Zhang wrote: > Are you read csv file ? If so you can use spark-csv which support skip > header > >

Re: Debug Spark

2015-12-02 Thread Sudhanshu Janghel
Hi, Here is a doc that I had created for my team. This has steps along with snapshots of how to setup debugging in spark using IntelliJ locally. https://docs.google.com/a/cloudwick.com/document/d/13kYPbmK61di0f_XxxJ-wLP5TSZRGMHE6bcTBjzXD0nA/edit?usp=sharing Kind Regards, Sudhanshu On Thu, Dec

Re: newbie how to upgrade a spark-ec2 cluster?

2015-12-02 Thread Vijay Gharge
Thanks Gourav ! I will refer google for this. Regards, Vijay Gharge On Thu, Dec 3, 2015 at 1:26 PM, Gourav Sengupta wrote: > Vijay, > > please Google for AWS lambda + S3 there are several used cases available. > Lambda are event based triggers and are executed when

how to skip headers when reading multiple files

2015-12-02 Thread Divya Gehlot
Hi, I am new bee to Spark and Scala . As one of my requirement to read and process multiple text files with headers using DataFrame API . How can I skip headers when processing data with DataFrame API Thanks in advance . Regards, Divya

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Dibyendu Bhattacharya
Well, even if you do correct retention and increase speed, OffsetOutOfRange can still come depends on how your downstream processing is. And if that happen , there is No Other way to recover old messages . So best bet here from Streaming Job point of view is to start from earliest offset rather

Re: newbie how to upgrade a spark-ec2 cluster?

2015-12-02 Thread Vijay Gharge
Hello Gourav, Can you please elaborate "trigger" part ? Any reference link will be really useful ! On Thursday 3 December 2015, Gourav Sengupta wrote: > Hi, > > And so you have the money to keep a SPARK cluster up and running? The way > I make it work is test the

Re: Multiplication on decimals in a dataframe query

2015-12-02 Thread Akhil Das
Not quiet sure whats happening, but its not an issue with multiplication i guess as the following query worked for me: trades.select(trades("price")*9.5).show +-+ |(price * 9.5)| +-+ |199.5| |228.0| |190.0| |199.5| |190.0| |

spark sql cli query results written to file ?

2015-12-02 Thread fightf...@163.com
HI, How could I save the spark sql cli running queries results and write the results to some local file ? Is there any available command ? Thanks, Sun. fightf...@163.com

Re: Debug Spark

2015-12-02 Thread Akhil Das
This doc will get you started https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IntelliJ Thanks Best Regards On Sun, Nov 29, 2015 at 9:48 PM, Masf wrote: > Hi > > Is it possible to debug spark locally with IntelliJ or another

Re: newbie how to upgrade a spark-ec2 cluster?

2015-12-02 Thread Ted Yu
Have you seen this thread ? http://search-hadoop.com/m/q3RTtvmsYMv0tKh2=Re+Upgrading+Spark+in+EC2+clusters On Wed, Dec 2, 2015 at 2:39 PM, Andy Davidson wrote: > I am using spark-1.5.1-bin-hadoop2.6. I used > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a

Improve saveAsTextFile performance

2015-12-02 Thread Ram VISWANADHA
JavaRDD.saveAsTextFile is taking a long time to succeed. There are 10 tasks, the first 9 complete in a reasonable time but the last task is taking a long time to complete. The last task contains the maximum number of records like 90% of the total number of records. Is there any way to

Re: Improve saveAsTextFile performance

2015-12-02 Thread Ted Yu
Have you tried calling coalesce() before saveAsTextFile ? Cheers On Wed, Dec 2, 2015 at 3:15 PM, Ram VISWANADHA < ram.viswana...@dailymotion.com> wrote: > JavaRDD.saveAsTextFile is taking a long time to succeed. There are 10 > tasks, the first 9 complete in a reasonable time but the last task

Re: newbie how to upgrade a spark-ec2 cluster?

2015-12-02 Thread Gourav Sengupta
Hi, And so you have the money to keep a SPARK cluster up and running? The way I make it work is test the code in local system with a localised spark installation and then create data pipeline triggered by lambda which starts SPARK cluster and processes the data via SPARK steps and then terminates

Sharing object/state accross transformations

2015-12-02 Thread JayKay
I'm new to Apache Spark and an absolute beginner. I'm playing around with Spark Streaming (API version 1.5.1) in Java and want to implement a prototype which uses HyperLogLog to estimate distinct elements. I use the stream-lib from clearspring (https://github.com/addthis/stream-lib). I planned

Re: [POWERED BY] Please add our organization

2015-12-02 Thread Adrien Mogenet
Hi folks, You're probably busy, but any update on this? :) On 16 November 2015 at 16:04, Adrien Mogenet < adrien.moge...@contentsquare.com> wrote: > Name: Content Square > URL: http://www.contentsquare.com > > Description: > We use Spark to regularly read raw data, convert them into Parquet,

Re: [POWERED BY] Please add our organization

2015-12-02 Thread Adrien Mogenet
Oh, right! I think it was user@ at the time I wrote my first message but it's clear now! Thanks Sean, On 2 December 2015 at 11:56, Sean Owen wrote: > Same, not sure if anyone handles this particularly but I'll do it. > This should go to dev@; I think we just put a note on

Re: [POWERED BY] Please add our organization

2015-12-02 Thread Sean Owen
Same, not sure if anyone handles this particularly but I'll do it. This should go to dev@; I think we just put a note on that wiki. On Wed, Dec 2, 2015 at 10:53 AM, Adrien Mogenet wrote: > Hi folks, > > You're probably busy, but any update on this? :) > > > On

Re: Spark Streaming - History UI

2015-12-02 Thread patcharee
I meant there is no streaming tab at all. It looks like I need version 1.6 Patcharee On 02. des. 2015 11:34, Steve Loughran wrote: The history UI doesn't update itself for live apps (SPARK-7889) -though I'm working on it Are you trying to view a running streaming job? On 2 Dec 2015, at

Jupyter configuration

2015-12-02 Thread Roberto Pagliari
Does anyone have a pointer to Jupyter configuration with pyspark? The current material on python inotebook is out of date, and jupyter ignores ipython profiles. Thank you,

newbie how to upgrade a spark-ec2 cluster?

2015-12-02 Thread Andy Davidson
I am using spark-1.5.1-bin-hadoop2.6. I used spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster. Any idea how I can upgrade to 1.5.2 prebuilt binary? Also if I choose to build the binary, how would I upgrade my cluster? Kind regards Andy

Re: Can Spark Execute Hive Update/Delete operations

2015-12-02 Thread Ted Yu
The referenced link seems to be w.r.t. Hive on Spark which is still in its own branch of Hive. FYI On Tue, Dec 1, 2015 at 11:23 PM, 张炜 wrote: > Hello Ted and all, > We are using Hive 1.2.1 and Spark 1.5.1 > I also noticed that there are other users reporting this

Re: Sharing object/state accross transformations

2015-12-02 Thread Ted Yu
Have you taken a look at streaming//src/test/java/org/apache/spark/streaming/JavaAPISuite.java ? especially testUpdateStateByKeyWithInitial() Cheers On Wed, Dec 2, 2015 at 2:54 AM, JayKay wrote: > I'm new to Apache Spark and an absolute beginner. I'm playing around

Re: Pyspark submitted app just hangs

2015-12-02 Thread Darren Govoni
The pyspark app stdout/err log shows this oddity. Traceback (most recent call last): File "/root/spark/notebooks/ingest/XXX.py", line 86, in print pdfRDD.collect()[:5] File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 773, in collect File

Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Yanbo Liang
You can get 1.6.0-RC1 from http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ currently, but it's not the last release version. 2015-12-02 23:57 GMT+08:00 Vishnu Viswanath : > Thank you Yanbo, > > It looks like this is available in 1.6 version

Pyspark submitted app just hangs

2015-12-02 Thread Darren Govoni
Hi all, Wondering if someone can provide some insight why this pyspark app is just hanging. Here is output. ... 15/12/03 01:47:05 INFO TaskSetManager: Starting task 21.0 in stage 0.0 (TID 21, 10.65.143.174, PROCESS_LOCAL, 1794787 bytes) 15/12/03 01:47:05 INFO TaskSetManager: Starting task

Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Vishnu Viswanath
Thank you. On Wed, Dec 2, 2015 at 8:12 PM, Yanbo Liang wrote: > You can get 1.6.0-RC1 from > http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ > currently, but it's not the last release version. > > 2015-12-02 23:57 GMT+08:00 Vishnu Viswanath