Bootstrap Action to Install Spark 2.0 on EMR?

2016-07-02 Thread Renxia Wang
Hi all, Anybody had tried out Spark 2.0 on EMR 4.x? Will it work? I am looking for a bootstrap action script to install it on EMR, does some one have a working one to share? Appreciate that! Best, Renxia

latest version of Spark to work OK as Hive engine

2016-07-02 Thread Ashok Kumar
Hi, Looking at this presentation Hive on Spark is Blazing Fast .. Which latest version of Spark can run as an engine for Hive please? Thanks P.S. I am aware of  Hive on TEZ but that is not what I am interested here please Warmest regards

Re: spark parquet too many small files ?

2016-07-02 Thread sri hari kali charan Tummala
Hi Takeshi, I cant use coalesce in spark-sql shell right I know we can use coalesce in spark with scala application , here in my project we are not building jar or using python we are just executing hive query in spark-sql shell and submitting to yarn client . Example:- spark-sql --verbose

Working of Streaming Kmeans

2016-07-02 Thread Biplob Biswas
Hi, I wanted to ask a very basic question about the working of Streaming Kmeans. Does the model update only when training (i.e. training dataset is used) or does it update on the PredictOnValues function as well for the test dataset? Thanks and Regards Biplob -- View this message in

RE: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Paolo Patierno
Yes ! We got it :-) Btw it's not available on Maven yet. :-( Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor Twitter : @ppatierno Linkedin : paolopatierno Blog : DevExperience > From: so...@cloudera.com > Date: Sat, 2 Jul

Re: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Sean Owen
Ah, it looks like it was 2.5.3 as of 2.0.0-preview: https://github.com/apache/spark/blob/2.0.0-preview/pom.xml#L164 but was updated to 2.6.5 soon after that, since it was 2.6.5 in 2.0.0-RC1: https://github.com/apache/spark/blob/v2.0.0-rc1/pom.xml#L163 On Sat, Jul 2, 2016 at 3:04 PM, Paolo

RE: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Paolo Patierno
This sounds strange to me because here : https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11/2.0.0-preview I see : com.fasterxml.jackson.module » jackson-module-scala_2.112.5.3 So it seems that 2.0.0-preview is bringing jackson module scala 2.5.3 that is what I see.

Re: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Sean Owen
This is something to do with your app. The version is 2.6.5 in master and branch-2.0, and jackson-module-scala is managed to this version along with all the other jackson artifacts. On Sat, Jul 2, 2016 at 1:35 PM, Paolo Patierno wrote: > What I see is the following ... > > -

RE: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Paolo Patierno
What I see is the following ... - Working configuration Spark Version : "2.0.0-SNAPSHOT" The Vert.x library brings ... jackson-annotations:2.6.0 jackson-core:2.6.1 jackson-databind:2.6.1 Spark brings jackson-annotations:2.6.5 jackson-core:2.6.5 jackson-databind:2.6.5 jackson-module-scala_2.11:

Spark-13979: issues with hadoopConf

2016-07-02 Thread Gil Vernik
Hello, Any ideas about this one https://issues.apache.org/jira/browse/SPARK-13979 ? Does others see the same issues? Thanks Gil.

Re: Several questions about how pyspark.ml works

2016-07-02 Thread Yanbo Liang
Hi Nick, Please see my inline reply. Thanks Yanbo 2016-06-12 3:08 GMT-07:00 XapaJIaMnu : > Hey, > > I have some additional Spark ML algorithms implemented in scala that I > would > like to make available in pyspark. For a reference I am looking at the > available logistic

Re: Trainning a spark ml linear regresion model fail after migrating from 1.5.2 to 1.6.1

2016-07-02 Thread Yanbo Liang
Yes, WeightedLeastSquares can not solve some ill-conditioned problem currently, the community members have paid some efforts to resolve it (SPARK-13777). For the work around, you can set the solver to "l-bfgs" which will train the LogisticRegressionModel by L-BFGS optimization method. 2016-06-09

Re: Get both feature importance and ROC curve from a random forest classifier

2016-07-02 Thread Yanbo Liang
Hi Mathieu, Using the new ml package to train a RandomForestClassificationModel, you can get feature importance. Then you can convert the prediction result to RDD and feed it into BinaryClassificationEvaluator for ROC curve. You can refer the following code snippet: val rf = new

Re: Enforcing shuffle hash join

2016-07-02 Thread Takeshi Yamamuro
Hi, No, spark has no hint for the hash join. // maropu On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MV wrote: > Hi, > > In order to force broadcast hash join, we can set > the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce > shuffle hash join in spark

Re: spark parquet too many small files ?

2016-07-02 Thread Takeshi Yamamuro
Please also see https://issues.apache.org/jira/browse/SPARK-16188. // maropu On Fri, Jul 1, 2016 at 7:39 PM, kali.tumm...@gmail.com < kali.tumm...@gmail.com> wrote: > I found the jira for the issue will there be a fix in future ? or no fix ? > > https://issues.apache.org/jira/browse/SPARK-6221

Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-02 Thread Takeshi Yamamuro
This is probably because the current thrift-server implementation has `SparkContext` inside (See: https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala#L34 ). To support yarn-cluster, we need to add a lots of

Re: Ideas to put a Spark ML model in production

2016-07-02 Thread Yanbo Liang
Let's suppose you have trained a LogisticRegressionModel and saved it at "/tmp/lr-model". You can copy the directory to production environment and use it to make prediction on users new data. You can refer the following code snippets: val model = LogisiticRegressionModel.load("/tmp/lr-model") val

Re: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Sean Owen
mvn dependency:tree? On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen wrote: > I'm having the same difficulty porting > https://github.com/metamx/druid-spark-batch/tree/spark2 over to spark2.x, > where I have to go track down who is pulling in bad jackson versions. >

Re: Custom Optimizer

2016-07-02 Thread Yanbo Liang
Spark MLlib does not support optimizer as a plugin, since the optimizer interface is private. Thanks Yanbo 2016-06-23 16:56 GMT-07:00 Stephen Boesch : > My team has a custom optimization routine that we would have wanted to > plug in as a replacement for the default LBFGS /

Re: Spark ML - Java implementation of custom Transformer

2016-07-02 Thread Yanbo Liang
Hi Mehdi, Could you share your code and then we can help you to figure out the problem? Actually JavaTestParams can work well but there is some compatibility issue for JavaDeveloperApiExample. We have removed JavaDeveloperApiExample temporary at Spark 2.0 in order to not confuse users. Since the