[no subject]

2017-10-22 Thread 梁义怀

Re: Spark ML - LogisticRegression interpreting prediction

2017-10-22 Thread Weichen Xu
The values you want to get (add up to 1.0) is "probability", not "rawPrediction". Thanks! On Mon, Oct 23, 2017 at 1:20 AM, pun wrote: > Hello, > I have a LogisticRegression model for predicting a binary label. Once I > train the model, I run it to get some

Spark ML - LogisticRegression interpreting prediction

2017-10-22 Thread pun
Hello, I have a LogisticRegression model for predicting a binary label. Once I train the model, I run it to get some predictions. I get the following values for RawPrediction. How should I interpret these? Whdo they mean? ++|rawPrediction

[Spark API] - Dynamic precision for same BigDecimal value

2017-10-22 Thread Irina Stan
Hi team, I'm a software developer, working with Apache Spark. Last week I have encountered a strange issue, which might be a bug. I see different precision for the same BigDecimal value, when calling the map() once against a dataFrame created as val df = sc.parallelize(seq).toDF(), and second

Re: Bulk load to HBase

2017-10-22 Thread Jörn Franke
Before you look at any new library/tool: What is the process of importing, what is the original file format, file size, compression etc . once you have investigated this you can start improving it. Then, as a last step a new framework can be explored. Feel free to share those and we can help you

Bulk load to HBase

2017-10-22 Thread Pradeep
We are on Hortonworks 2.5 and very soon upgrading to 2.6. Spark version 1.6.2. We have large volume of data that we bulk load to HBase using import tsv. Map Reduce job is very slow and looking for options we can use spark to improve performance. Please let me know if this can be optimized with