Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

2017-03-20 Thread Yan Facai
Hi, jinhong. Do you use `setRegParam`, which is 0.0 by default ? Both elasticNetParam and regParam are required if regularization is need. val regParamL1 = $(elasticNetParam) * $(regParam) val regParamL2 = (1.0 - $(elasticNetParam)) * $(regParam) On Mon, Mar 20, 2017 at 6:31 PM, Yanbo Liang

Re: Outstanding Spark 2.1.1 issues

2017-03-20 Thread Holden Karau
I'm not super sure it should be a blocker for 2.1.1 -- is it a regression? Maybe we can get TDs input on it? On Mon, Mar 20, 2017 at 8:48 PM Nan Zhu wrote: > I think https://issues.apache.org/jira/browse/SPARK-19280 should be a > blocker > > Best, > > Nan > > On Mon, Mar

Re: Outstanding Spark 2.1.1 issues

2017-03-20 Thread Nan Zhu
I think https://issues.apache.org/jira/browse/SPARK-19280 should be a blocker Best, Nan On Mon, Mar 20, 2017 at 8:18 PM, Felix Cheung wrote: > I've been scrubbing R and think we are tracking 2 issues > > https://issues.apache.org/jira/browse/SPARK-19237 > >

Re: Outstanding Spark 2.1.1 issues

2017-03-20 Thread Felix Cheung
I've been scrubbing R and think we are tracking 2 issues https://issues.apache.org/jira/browse/SPARK-19237 https://issues.apache.org/jira/browse/SPARK-19925 From: holden.ka...@gmail.com on behalf of Holden Karau

Re: Why are DataFrames always read with nullable=True?

2017-03-20 Thread Kazuaki Ishizaki
Hi, Regarding reading part for nullable, it seems to be considered to add a data cleaning step as Xiao said at https://www.mail-archive.com/user@spark.apache.org/msg39233.html. Here is a PR https://github.com/apache/spark/pull/17293 to add the data cleaning step that throws an exception if

Re: Why are DataFrames always read with nullable=True?

2017-03-20 Thread Takeshi Yamamuro
Hi, Have you check the related JIRA? e.g., https://issues.apache.org/jira/browse/SPARK-19950 If you have any ask and request, you'd better to do there. Thanks! // maropu On Tue, Mar 21, 2017 at 6:30 AM, Jason White wrote: > If I create a dataframe in Spark with

Re: Outstanding Spark 2.1.1 issues

2017-03-20 Thread Daniel Siegmann
Any chance of back-porting SPARK-14536 - NPE in JDBCRDD when array column contains nulls (postgresql) It just adds a null check - just a simple bug fix - so it really belongs in Spark 2.1.x. On Mon, Mar 20, 2017 at 6:12 PM, Holden Karau

Outstanding Spark 2.1.1 issues

2017-03-20 Thread Holden Karau
Hi Spark Developers! As we start working on the Spark 2.1.1 release I've been looking at our outstanding issues still targeted for it. I've tried to break it down by component so that people in charge of each component can take a quick look and see if any of these things can/should be re-targeted

Why are DataFrames always read with nullable=True?

2017-03-20 Thread Jason White
If I create a dataframe in Spark with non-nullable columns, and then save that to disk as a Parquet file, the columns are properly marked as non-nullable. I confirmed this using parquet-tools. Then, when loading it back, Spark forces the nullable back to True.

Re: Should we consider a Spark 2.1.1 release?

2017-03-20 Thread Ted Yu
Timur: Mind starting a new thread ? I have the same question as you have. > On Mar 20, 2017, at 11:34 AM, Timur Shenkao wrote: > > Hello guys, > > Spark benefits from stable versions not frequent ones. > A lot of people still have 1.6.x in production. Those who wants the

Re: Should we consider a Spark 2.1.1 release?

2017-03-20 Thread Holden Karau
I think questions around how long the 1.6 series will be supported are really important, but probably belong in a different thread than the 2.1.1 release discussion. On Mon, Mar 20, 2017 at 11:34 AM Timur Shenkao wrote: > Hello guys, > > Spark benefits from stable versions

Re: Should we consider a Spark 2.1.1 release?

2017-03-20 Thread Timur Shenkao
Hello guys, Spark benefits from stable versions not frequent ones. A lot of people still have 1.6.x in production. Those who wants the freshest (like me) can always deploy night builts. My question is: how long version 1.6 will be supported? On Sunday, March 19, 2017, Holden Karau

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

2017-03-20 Thread Yanbo Liang
Do you want to get sparse model that most of the coefficients are zeros? If yes, using L1 regularization leads to sparsity. But the LogisticRegressionModel coefficients vector's size is still equal with the number of features, you can get the non-zero elements manually. Actually, it would be a

Re: Issues: Generate JSON with null values in Spark 2.0.x

2017-03-20 Thread Chetan Khatri
Exactly. On Sat, Mar 11, 2017 at 1:35 PM, Dongjin Lee wrote: > Hello Chetan, > > Could you post some code? If I understood correctly, you are trying to > save JSON like: > > { > "first_name": "Dongjin", > "last_name: null > } > > not in omitted form, like: > > { >