from:"Xin Ren \(JIRA\)"

[jira] [Commented] (SPARK-28555) Recover options and properties and pass them back into the v1 API

2019-07-29 Thread Xin Ren (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895470#comment-16895470
 ] 

Xin Ren commented on SPARK-28555:
-

I'm working on it :)

> Recover options and properties and pass them back into the v1 API
> -
>
> Key: SPARK-28555
> URL: https://issues.apache.org/jira/browse/SPARK-28555
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xin Ren
>Priority: Minor
>
> When tables are created, the {{CREATE TABLE}} syntax supports both 
> {{TBLPROPERTIES}} and {{OPTIONS}}. Options were used in v1 to configure the 
> table itself, like options passed to {{DataFrameReader}}. Right now, both 
> properties and options are stored in v2 table properties, because v2 only has 
> properties, not both. But, we aren't able to recover which properties were 
> set through {{OPTIONS}} and which were set through {{TBLPROPERTIES}}.
> Instead of the current behavior, I think options should be prefixed with 
> {{option.}}. That way, we can recover options and properties and pass them 
> back into the v1 API.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28555) Recover options and properties and pass them back into the v1 API

2019-07-29 Thread Xin Ren (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-28555:

Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-22386

> Recover options and properties and pass them back into the v1 API
> -
>
> Key: SPARK-28555
> URL: https://issues.apache.org/jira/browse/SPARK-28555
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xin Ren
>Priority: Minor
>
> When tables are created, the {{CREATE TABLE}} syntax supports both 
> {{TBLPROPERTIES}} and {{OPTIONS}}. Options were used in v1 to configure the 
> table itself, like options passed to {{DataFrameReader}}. Right now, both 
> properties and options are stored in v2 table properties, because v2 only has 
> properties, not both. But, we aren't able to recover which properties were 
> set through {{OPTIONS}} and which were set through {{TBLPROPERTIES}}.
> Instead of the current behavior, I think options should be prefixed with 
> {{option.}}. That way, we can recover options and properties and pass them 
> back into the v1 API.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28555) Recover options and properties and pass them back into the v1 API

2019-07-29 Thread Xin Ren (JIRA)

Xin Ren created SPARK-28555:
---

 Summary: Recover options and properties and pass them back into 
the v1 API
 Key: SPARK-28555
 URL: https://issues.apache.org/jira/browse/SPARK-28555
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xin Ren


When tables are created, the {{CREATE TABLE}} syntax supports both 
{{TBLPROPERTIES}} and {{OPTIONS}}. Options were used in v1 to configure the 
table itself, like options passed to {{DataFrameReader}}. Right now, both 
properties and options are stored in v2 table properties, because v2 only has 
properties, not both. But, we aren't able to recover which properties were set 
through {{OPTIONS}} and which were set through {{TBLPROPERTIES}}.

Instead of the current behavior, I think options should be prefixed with 
{{option.}}. That way, we can recover options and properties and pass them back 
into the v1 API.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-28139) DataSourceV2: Add AlterTable v2 implementation

2019-07-29 Thread Xin Ren (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren closed SPARK-28139.
---

> DataSourceV2: Add AlterTable v2 implementation
> --
>
> Key: SPARK-28139
> URL: https://issues.apache.org/jira/browse/SPARK-28139
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.0.0
>
>
> SPARK-27857 updated the parser for v2 ALTER TABLE statements. This tracks 
> implementing those using a v2 catalog.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20498) RandomForestRegressionModel should expose getMaxDepth in PySpark

2017-05-26 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-20498:


Sure please go ahead
On Fri, May 26, 2017 at 12:55 AM Yan Facai (颜发才) (JIRA) 



> RandomForestRegressionModel should expose getMaxDepth in PySpark
> 
>
> Key: SPARK-20498
> URL: https://issues.apache.org/jira/browse/SPARK-20498
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.1.0
>Reporter: Nick Lothian
>Assignee: Xin Ren
>Priority: Minor
>
> Currently it isn't clear hot to get the max depth of a 
> RandomForestRegressionModel (eg, after doing a grid search)
> It is possible to call
> {{regressor._java_obj.getMaxDepth()}} 
> but most other decision trees allow
> {{regressor.getMaxDepth()}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19282) RandomForestRegressionModel summary should expose getMaxDepth

2017-04-19 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974188#comment-15974188
 ] 

Xin Ren commented on SPARK-19282:
-

sorry [~bryanc] I'm just back from vacation...

and sure I'd love to help, just let me know :)

> RandomForestRegressionModel summary should expose getMaxDepth
> -
>
> Key: SPARK-19282
> URL: https://issues.apache.org/jira/browse/SPARK-19282
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark, SparkR
>Affects Versions: 2.1.0
>Reporter: Nick Lothian
>Assignee: Xin Ren
>Priority: Minor
>
> Currently it isn't clear hot to get the max depth of a 
> RandomForestRegressionModel (eg, after doing a grid search)
> It is possible to call
> {{regressor._java_obj.getMaxDepth()}} 
> but most other decision trees allow
> {{regressor.getMaxDepth()}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19282) RandomForestRegressionModel summary should expose getMaxDepth

2017-04-19 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974186#comment-15974186
 ] 

Xin Ren commented on SPARK-19282:
-

yes, for R side, both parameters are exposed

> RandomForestRegressionModel summary should expose getMaxDepth
> -
>
> Key: SPARK-19282
> URL: https://issues.apache.org/jira/browse/SPARK-19282
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark, SparkR
>Affects Versions: 2.1.0
>Reporter: Nick Lothian
>Assignee: Xin Ren
>Priority: Minor
>
> Currently it isn't clear hot to get the max depth of a 
> RandomForestRegressionModel (eg, after doing a grid search)
> It is possible to call
> {{regressor._java_obj.getMaxDepth()}} 
> but most other decision trees allow
> {{regressor.getMaxDepth()}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19282) RandomForestRegressionModel summary should expose getMaxDepth

2017-03-14 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924728#comment-15924728
 ] 

Xin Ren commented on SPARK-19282:
-

thanks Bryan, could you please create some sub tasks under SPARK-10931?

I'd like to help on it if possible

> RandomForestRegressionModel summary should expose getMaxDepth
> -
>
> Key: SPARK-19282
> URL: https://issues.apache.org/jira/browse/SPARK-19282
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark, SparkR
>Affects Versions: 2.1.0
>Reporter: Nick Lothian
>Assignee: Xin Ren
>Priority: Minor
> Fix For: 2.2.0
>
>
> Currently it isn't clear hot to get the max depth of a 
> RandomForestRegressionModel (eg, after doing a grid search)
> It is possible to call
> {{regressor._java_obj.getMaxDepth()}} 
> but most other decision trees allow
> {{regressor.getMaxDepth()}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19282) RandomForestRegressionModel summary should expose getMaxDepth

2017-03-14 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924374#comment-15924374
 ] 

Xin Ren commented on SPARK-19282:
-

sure, I'm working on python part :)

> RandomForestRegressionModel summary should expose getMaxDepth
> -
>
> Key: SPARK-19282
> URL: https://issues.apache.org/jira/browse/SPARK-19282
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark, SparkR
>Affects Versions: 2.1.0
>Reporter: Nick Lothian
>Assignee: Xin Ren
>Priority: Minor
> Fix For: 2.2.0
>
>
> Currently it isn't clear hot to get the max depth of a 
> RandomForestRegressionModel (eg, after doing a grid search)
> It is possible to call
> {{regressor._java_obj.getMaxDepth()}} 
> but most other decision trees allow
> {{regressor.getMaxDepth()}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19282) RandomForestRegressionModel summary should expose getMaxDepth

2017-03-14 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923701#comment-15923701
 ] 

Xin Ren commented on SPARK-19282:
-

Hi Nick, just double check I understand you correctly, you'd like to expose 
parameter `maxDepth` for python module too, right?

> RandomForestRegressionModel summary should expose getMaxDepth
> -
>
> Key: SPARK-19282
> URL: https://issues.apache.org/jira/browse/SPARK-19282
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark, SparkR
>Affects Versions: 2.1.0
>Reporter: Nick Lothian
>Assignee: Xin Ren
>Priority: Minor
> Fix For: 2.2.0
>
>
> Currently it isn't clear hot to get the max depth of a 
> RandomForestRegressionModel (eg, after doing a grid search)
> It is possible to call
> {{regressor._java_obj.getMaxDepth()}} 
> but most other decision trees allow
> {{regressor.getMaxDepth()}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19866) Add local version of Word2Vec findSynonyms for spark.ml: Python API

2017-03-08 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902522#comment-15902522
 ] 

Xin Ren commented on SPARK-19866:
-

I can try this one :)

> Add local version of Word2Vec findSynonyms for spark.ml: Python API
> ---
>
> Key: SPARK-19866
> URL: https://issues.apache.org/jira/browse/SPARK-19866
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Add Python API for findSynonymsArray matching Scala API in linked JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19282) RandomForestRegressionModel should expose getMaxDepth

2017-02-08 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858744#comment-15858744
 ] 

Xin Ren commented on SPARK-19282:
-

I just got approved by my company to work on this one

resuming my work on this task :)

> RandomForestRegressionModel should expose getMaxDepth
> -
>
> Key: SPARK-19282
> URL: https://issues.apache.org/jira/browse/SPARK-19282
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Nick Lothian
>Priority: Minor
>
> Currently it isn't clear hot to get the max depth of a 
> RandomForestRegressionModel (eg, after doing a grid search)
> It is possible to call
> {{regressor._java_obj.getMaxDepth()}} 
> but most other decision trees allow
> {{regressor.getMaxDepth()}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19282) RandomForestRegressionModel should expose getMaxDepth

2017-01-22 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833973#comment-15833973
 ] 

Xin Ren commented on SPARK-19282:
-

sorry Nick, now I cannot make it for this fix.

anyone else please take a look? thanks a lot

> RandomForestRegressionModel should expose getMaxDepth
> -
>
> Key: SPARK-19282
> URL: https://issues.apache.org/jira/browse/SPARK-19282
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Nick Lothian
>Priority: Minor
>
> Currently it isn't clear hot to get the max depth of a 
> RandomForestRegressionModel (eg, after doing a grid search)
> It is possible to call
> {{regressor._java_obj.getMaxDepth()}} 
> but most other decision trees allow
> {{regressor.getMaxDepth()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19282) RandomForestRegressionModel should expose getMaxDepth

2017-01-20 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832022#comment-15832022
 ] 

Xin Ren commented on SPARK-19282:
-

Thank you Nick. 
I'll give it a try to fix it. :)

> RandomForestRegressionModel should expose getMaxDepth
> -
>
> Key: SPARK-19282
> URL: https://issues.apache.org/jira/browse/SPARK-19282
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Nick Lothian
>Priority: Minor
>
> Currently it isn't clear hot to get the max depth of a 
> RandomForestRegressionModel (eg, after doing a grid search)
> It is possible to call
> {{regressor._java_obj.getMaxDepth()}} 
> but most other decision trees allow
> {{regressor.getMaxDepth()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19282) RandomForestRegressionModel should expose getMaxDepth

2017-01-19 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831342#comment-15831342
 ] 

Xin Ren commented on SPARK-19282:
-

sorry being naive, I'm not familiar with random forest, but is "max depth" an 
important metrics/param of RF model?

> RandomForestRegressionModel should expose getMaxDepth
> -
>
> Key: SPARK-19282
> URL: https://issues.apache.org/jira/browse/SPARK-19282
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Nick Lothian
>Priority: Minor
>
> Currently it isn't clear hot to get the max depth of a 
> RandomForestRegressionModel (eg, after doing a grid search)
> It is possible to call
> {{regressor._java_obj.getMaxDepth()}} 
> but most other decision trees allow
> {{regressor.getMaxDepth()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18907) Fix flaky test: o.a.s.sql.streaming.FileStreamSourceSuite max files per trigger - incorrect values

2017-01-17 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826924#comment-15826924
 ] 

Xin Ren commented on SPARK-18907:
-

thanks Shixiong :P

> Fix flaky test: o.a.s.sql.streaming.FileStreamSourceSuite max files per 
> trigger - incorrect values
> --
>
> Key: SPARK-18907
> URL: https://issues.apache.org/jira/browse/SPARK-18907
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming, Tests
>Reporter: Shixiong Zhu
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18907) Fix flaky test: o.a.s.sql.streaming.FileStreamSourceSuite max files per trigger - incorrect values

2017-01-17 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826909#comment-15826909
 ] 

Xin Ren commented on SPARK-18907:
-

Hi Shixiong, what do you mean by flaky? to intercept these Exceptions?

not sure about the expected outcome of this test case, thanks a lot.

{code}
13:54:38.697 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
13:54:50.257 ERROR org.apache.spark.sql.execution.streaming.StreamExecution: 
Query maxFilesPerTrigger_test [id = 118d8397-3dab-49f4-a1e7-eb8ec7dd4fc2, runId 
= ae622c2d-eb0e-4647-beca-907b5dac59b0] terminated with error
java.lang.IllegalArgumentException: Invalid value 'not-a-integer' for option 
'maxFilesPerTrigger', must be a positive integer
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2$$anonfun$apply$3.apply(FileStreamOptions.scala:35)
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2$$anonfun$apply$3.apply(FileStreamOptions.scala:35)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2.apply(FileStreamOptions.scala:34)
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2.apply(FileStreamOptions.scala:33)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions.(FileStreamOptions.scala:33)
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions.(FileStreamOptions.scala:31)
at 
org.apache.spark.sql.execution.streaming.FileStreamSource.(FileStreamSource.scala:44)
at 
org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:256)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:140)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:136)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan$lzycompute(StreamExecution.scala:136)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan(StreamExecution.scala:131)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:246)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:186)
13:54:52.364 ERROR org.apache.spark.sql.execution.streaming.StreamExecution: 
Query maxFilesPerTrigger_test [id = 6c42063d-39f4-4722-b529-fc3d379c691d, runId 
= df0c4fae-49db-4be3-8270-662fe0947559] terminated with error
java.lang.IllegalArgumentException: Invalid value '-1' for option 
'maxFilesPerTrigger', must be a positive integer
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2$$anonfun$apply$3.apply(FileStreamOptions.scala:35)
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2$$anonfun$apply$3.apply(FileStreamOptions.scala:35)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2.apply(FileStreamOptions.scala:34)
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2.apply(FileStreamOptions.scala:33)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions.(FileStreamOptions.scala:33)
at 
org.apache.spark.sql.execution.streaming.FileStreamOptions.(FileStreamOptions.scala:31)
at 
org.apache.spark.sql.execution.streaming.FileStreamSource.(FileStreamSource.scala:44)
at 
org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:256)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:140)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:136)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at

[jira] [Commented] (SPARK-17724) Unevaluated new lines in tooltip in DAG Visualization of a job

2016-10-03 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543994#comment-15543994
 ] 

Xin Ren commented on SPARK-17724:
-

I can give it a try

> Unevaluated new lines in tooltip in DAG Visualization of a job
> --
>
> Key: SPARK-17724
> URL: https://issues.apache.org/jira/browse/SPARK-17724
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: Jacek Laskowski
>Priority: Minor
> Attachments: 
> spark-webui-job-details-dagvisualization-newlines-broken.png
>
>
> The tooltips in DAG Visualization for a job show new lines verbatim 
> (unevaluated).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17628) Name of "object StreamingExamples" should be more self-explanatory

2016-09-21 Thread Xin Ren (JIRA)

Xin Ren created SPARK-17628:
---

 Summary: Name of "object StreamingExamples" should be more 
self-explanatory 
 Key: SPARK-17628
 URL: https://issues.apache.org/jira/browse/SPARK-17628
 Project: Spark
  Issue Type: Bug
  Components: Examples, Streaming
Affects Versions: 2.0.0
Reporter: Xin Ren
Priority: Minor


`object StreamingExamples` is more of a utility object, and the name is too 
general and I thought it's an actual streaming example at the very beginning.

{code}
/** Utility functions for Spark Streaming examples. */
object StreamingExamples extends Logging {

  /** Set reasonable logging levels for streaming if the user has not 
configured log4j. */
  def setStreamingLogLevels() {
val log4jInitialized = Logger.getRootLogger.getAllAppenders.hasMoreElements
if (!log4jInitialized) {
  // We first log something to initialize Spark's default logging, then we 
override the
  // logging level.
  logInfo("Setting log level to [WARN] for streaming example." +
" To override add a custom log4j.properties to the classpath.")
  Logger.getRootLogger.setLevel(Level.WARN)
}
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17476) Proper handling for unseen labels in logistic regression training.

2016-09-09 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478517#comment-15478517
 ] 

Xin Ren commented on SPARK-17476:
-

Hi I can try to work on this one, thanks :)

> Proper handling for unseen labels in logistic regression training.
> --
>
> Key: SPARK-17476
> URL: https://issues.apache.org/jira/browse/SPARK-17476
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Now that logistic regression supports multiclass, it is possible to train on 
> data that has {{K}} classes, but one or more of the classes does not appear 
> in training. For example,
> {code}
> (0.0, x1)
> (2.0, x2)
> ...
> {code}
> Currently, logistic regression assumes that the outcome classes in the above 
> dataset have three levels: {{0, 1, 2}}. Since label 1 never appears, it 
> should never be predicted. In theory, the coefficients should be zero and the 
> intercept should be negative infinity. This can cause problems since we 
> center the intercepts after training.
> We should discuss whether or not the intercepts actually tend to -infinity in 
> practice, and whether or not we should even include them in training. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17276) Stop environment parameters flooding Jenkins build output

2016-08-26 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-17276:

Attachment: Screen Shot 2016-08-26 at 10.52.07 PM.png

> Stop environment parameters flooding Jenkins build output
> -
>
> Key: SPARK-17276
> URL: https://issues.apache.org/jira/browse/SPARK-17276
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Tests
>Affects Versions: 2.0.0
>Reporter: Xin Ren
>Priority: Minor
> Attachments: Screen Shot 2016-08-26 at 10.52.07 PM.png
>
>
> When I was trying to find error msg in a failed Jenkins build job, annoyed by 
> the huge env output. 
> The env parameter output should be muted.
> {code}
> [info] PipedRDDSuite:
> [info] - basic pipe (51 milliseconds)
>   0   0   0
> [info] - basic pipe with tokenization (60 milliseconds)
> [info] - failure in iterating over pipe input (49 milliseconds)
> [info] - advanced pipe (100 milliseconds)
> [info] - pipe with empty partition (117 milliseconds)
> PATH=/home/anaconda/envs/py3k/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:/usr/java/jdk1.8.0_60/bin:/home/jenkins/tools/hudson.model.JDK/JDK_7u60/bin:/home/jenkins/.cargo/bin:/home/anaconda/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/:/home/android-sdk/:/usr/local/bin:/bin:/usr/bin:/home/anaconda/envs/py3k/bin
> BUILD_CAUSE_GHPRBCAUSE=true
> SBT_MAVEN_PROFILES=-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl 
> -Phive-thriftserver
> HUDSON_HOME=/var/lib/jenkins
> AWS_SECRET_ACCESS_KEY=
> JOB_URL=https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/
> HUDSON_COOKIE=638da3d2-d27a-4724-b41a-5ff6e8ce6752
> LINES=24
> CURRENT_BLOCK=18
> ANDROID_HOME=/home/android-sdk/
> ghprbActualCommit=70a751c6959048e65c083ab775b01523da4578a2
> ghprbSourceBranch=codeWalkThroughML
> GITHUB_OAUTH_KEY=
> MAIL=/var/mail/jenkins
> AMPLAB_JENKINS=1
> JENKINS_SERVER_COOKIE=472906e9832aeb79
> ghprbPullTitle=[MINOR][MLlib][SQL] Clean up unused variables and unused import
> LOGNAME=jenkins
> PWD=/home/jenkins/workspace/SparkPullRequestBuilder
> JENKINS_URL=https://amplab.cs.berkeley.edu/jenkins/
> SPARK_VERSIONS_SUITE_IVY_PATH=/home/sparkivy/per-executor-caches/9/.ivy2
> ROOT_BUILD_CAUSE_GHPRBCAUSE=true
> ghprbActualCommitAuthorEmail=iamsh...@126.com
> ghprbTargetBranch=master
> BUILD_TAG=jenkins-SparkPullRequestBuilder-64504
> SHELL=/bin/bash
> ROOT_BUILD_CAUSE=GHPRBCAUSE
> SBT_OPTS=-Duser.home=/home/sparkivy/per-executor-caches/9 
> -Dsbt.ivy.home=/home/sparkivy/per-executor-caches/9/.ivy2
> JENKINS_HOME=/var/lib/jenkins
> sha1=origin/pr/14836/merge
> ghprbPullDescription=GitHub pull request #14836 of commit 
> 70a751c6959048e65c083ab775b01523da4578a2 automatically merged.
> NODE_NAME=amp-jenkins-worker-02
> BUILD_DISPLAY_NAME=#64504
> JAVA_7_HOME=/usr/java/jdk1.7.0_79
> GIT_BRANCH=codeWalkThroughML
> SHLVL=3
> AMP_JENKINS_PRB=true
> JAVA_HOME=/usr/java/jdk1.8.0_60
> JENKINS_MASTER_HOSTNAME=amp-jenkins-master
> BUILD_ID=64504
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
> ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/14836
> JOB_NAME=SparkPullRequestBuilder
> BUILD_CAUSE=GHPRBCAUSE
> SPARK_SCALA_VERSION=2.11
> AWS_ACCESS_KEY_ID=
> NODE_LABELS=amp-jenkins-worker-02 centos spark-compile spark-test
> HUDSON_URL=https://amplab.cs.berkeley.edu/jenkins/
> SPARK_PREPEND_CLASSES=1
> COLUMNS=80
> WORKSPACE=/home/jenkins/workspace/SparkPullRequestBuilder
> SPARK_TESTING=1
> _=/usr/java/jdk1.8.0_60/bin/java
> GIT_COMMIT=b31b82bcc9d8767561ee720c9e7192252f4fd3fc
> ghprbPullId=14836
> EXECUTOR_NUMBER=9
> SSH_CLIENT=192.168.10.10 44762 22
> HUDSON_SERVER_COOKIE=472906e9832aeb79
> cat: nonexistent_file: No such file or directory
> cat: nonexistent_file: No such file or directory
>

[jira] [Commented] (SPARK-17276) Stop environment parameters flooding Jenkins build output

2016-08-26 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440782#comment-15440782
 ] 

Xin Ren commented on SPARK-17276:
-

I'm working on it.

> Stop environment parameters flooding Jenkins build output
> -
>
> Key: SPARK-17276
> URL: https://issues.apache.org/jira/browse/SPARK-17276
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Tests
>Affects Versions: 2.0.0
>Reporter: Xin Ren
>Priority: Minor
> Attachments: Screen Shot 2016-08-26 at 10.52.07 PM.png
>
>
> When I was trying to find error msg in a failed Jenkins build job, annoyed by 
> the huge env output. 
> The env parameter output should be muted.
> {code}
> [info] PipedRDDSuite:
> [info] - basic pipe (51 milliseconds)
>   0   0   0
> [info] - basic pipe with tokenization (60 milliseconds)
> [info] - failure in iterating over pipe input (49 milliseconds)
> [info] - advanced pipe (100 milliseconds)
> [info] - pipe with empty partition (117 milliseconds)
> PATH=/home/anaconda/envs/py3k/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:/usr/java/jdk1.8.0_60/bin:/home/jenkins/tools/hudson.model.JDK/JDK_7u60/bin:/home/jenkins/.cargo/bin:/home/anaconda/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/:/home/android-sdk/:/usr/local/bin:/bin:/usr/bin:/home/anaconda/envs/py3k/bin
> BUILD_CAUSE_GHPRBCAUSE=true
> SBT_MAVEN_PROFILES=-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl 
> -Phive-thriftserver
> HUDSON_HOME=/var/lib/jenkins
> AWS_SECRET_ACCESS_KEY=
> JOB_URL=https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/
> HUDSON_COOKIE=638da3d2-d27a-4724-b41a-5ff6e8ce6752
> LINES=24
> CURRENT_BLOCK=18
> ANDROID_HOME=/home/android-sdk/
> ghprbActualCommit=70a751c6959048e65c083ab775b01523da4578a2
> ghprbSourceBranch=codeWalkThroughML
> GITHUB_OAUTH_KEY=
> MAIL=/var/mail/jenkins
> AMPLAB_JENKINS=1
> JENKINS_SERVER_COOKIE=472906e9832aeb79
> ghprbPullTitle=[MINOR][MLlib][SQL] Clean up unused variables and unused import
> LOGNAME=jenkins
> PWD=/home/jenkins/workspace/SparkPullRequestBuilder
> JENKINS_URL=https://amplab.cs.berkeley.edu/jenkins/
> SPARK_VERSIONS_SUITE_IVY_PATH=/home/sparkivy/per-executor-caches/9/.ivy2
> ROOT_BUILD_CAUSE_GHPRBCAUSE=true
> ghprbActualCommitAuthorEmail=iamsh...@126.com
> ghprbTargetBranch=master
> BUILD_TAG=jenkins-SparkPullRequestBuilder-64504
> SHELL=/bin/bash
> ROOT_BUILD_CAUSE=GHPRBCAUSE
> SBT_OPTS=-Duser.home=/home/sparkivy/per-executor-caches/9 
> -Dsbt.ivy.home=/home/sparkivy/per-executor-caches/9/.ivy2
> JENKINS_HOME=/var/lib/jenkins
> sha1=origin/pr/14836/merge
> ghprbPullDescription=GitHub pull request #14836 of commit 
> 70a751c6959048e65c083ab775b01523da4578a2 automatically merged.
> NODE_NAME=amp-jenkins-worker-02
> BUILD_DISPLAY_NAME=#64504
> JAVA_7_HOME=/usr/java/jdk1.7.0_79
> GIT_BRANCH=codeWalkThroughML
> SHLVL=3
> AMP_JENKINS_PRB=true
> JAVA_HOME=/usr/java/jdk1.8.0_60
> JENKINS_MASTER_HOSTNAME=amp-jenkins-master
> BUILD_ID=64504
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
> ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/14836
> JOB_NAME=SparkPullRequestBuilder
> BUILD_CAUSE=GHPRBCAUSE
> SPARK_SCALA_VERSION=2.11
> AWS_ACCESS_KEY_ID=
> NODE_LABELS=amp-jenkins-worker-02 centos spark-compile spark-test
> HUDSON_URL=https://amplab.cs.berkeley.edu/jenkins/
> SPARK_PREPEND_CLASSES=1
> COLUMNS=80
> WORKSPACE=/home/jenkins/workspace/SparkPullRequestBuilder
> SPARK_TESTING=1
> _=/usr/java/jdk1.8.0_60/bin/java
> GIT_COMMIT=b31b82bcc9d8767561ee720c9e7192252f4fd3fc
> ghprbPullId=14836
> EXECUTOR_NUMBER=9
> SSH_CLIENT=192.168.10.10 44762 22
> HUDSON_SERVER_COOKIE=472906e9832aeb79
> cat: nonexistent_file: No such file or directory
> cat: nonexistent_file: No such file or directory
>

[jira] [Created] (SPARK-17276) Stop environment parameters flooding Jenkins build output

2016-08-26 Thread Xin Ren (JIRA)

Xin Ren created SPARK-17276:
---

 Summary: Stop environment parameters flooding Jenkins build output
 Key: SPARK-17276
 URL: https://issues.apache.org/jira/browse/SPARK-17276
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Tests
Affects Versions: 2.0.0
Reporter: Xin Ren
Priority: Minor


When I was trying to find error msg in a failed Jenkins build job, annoyed by 
the huge env output. 

The env parameter output should be muted.

{code}
[info] PipedRDDSuite:
[info] - basic pipe (51 milliseconds)
  0   0   0
[info] - basic pipe with tokenization (60 milliseconds)
[info] - failure in iterating over pipe input (49 milliseconds)
[info] - advanced pipe (100 milliseconds)
[info] - pipe with empty partition (117 milliseconds)
PATH=/home/anaconda/envs/py3k/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:/usr/java/jdk1.8.0_60/bin:/home/jenkins/tools/hudson.model.JDK/JDK_7u60/bin:/home/jenkins/.cargo/bin:/home/anaconda/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/:/home/android-sdk/:/usr/local/bin:/bin:/usr/bin:/home/anaconda/envs/py3k/bin
BUILD_CAUSE_GHPRBCAUSE=true
SBT_MAVEN_PROFILES=-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl -Phive-thriftserver
HUDSON_HOME=/var/lib/jenkins
AWS_SECRET_ACCESS_KEY=
JOB_URL=https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/
HUDSON_COOKIE=638da3d2-d27a-4724-b41a-5ff6e8ce6752
LINES=24
CURRENT_BLOCK=18
ANDROID_HOME=/home/android-sdk/
ghprbActualCommit=70a751c6959048e65c083ab775b01523da4578a2
ghprbSourceBranch=codeWalkThroughML
GITHUB_OAUTH_KEY=
MAIL=/var/mail/jenkins
AMPLAB_JENKINS=1
JENKINS_SERVER_COOKIE=472906e9832aeb79
ghprbPullTitle=[MINOR][MLlib][SQL] Clean up unused variables and unused import
LOGNAME=jenkins
PWD=/home/jenkins/workspace/SparkPullRequestBuilder
JENKINS_URL=https://amplab.cs.berkeley.edu/jenkins/
SPARK_VERSIONS_SUITE_IVY_PATH=/home/sparkivy/per-executor-caches/9/.ivy2
ROOT_BUILD_CAUSE_GHPRBCAUSE=true
ghprbActualCommitAuthorEmail=iamsh...@126.com
ghprbTargetBranch=master
BUILD_TAG=jenkins-SparkPullRequestBuilder-64504
SHELL=/bin/bash
ROOT_BUILD_CAUSE=GHPRBCAUSE
SBT_OPTS=-Duser.home=/home/sparkivy/per-executor-caches/9 
-Dsbt.ivy.home=/home/sparkivy/per-executor-caches/9/.ivy2
JENKINS_HOME=/var/lib/jenkins
sha1=origin/pr/14836/merge
ghprbPullDescription=GitHub pull request #14836 of commit 
70a751c6959048e65c083ab775b01523da4578a2 automatically merged.
NODE_NAME=amp-jenkins-worker-02
BUILD_DISPLAY_NAME=#64504
JAVA_7_HOME=/usr/java/jdk1.7.0_79
GIT_BRANCH=codeWalkThroughML
SHLVL=3
AMP_JENKINS_PRB=true
JAVA_HOME=/usr/java/jdk1.8.0_60
JENKINS_MASTER_HOSTNAME=amp-jenkins-master
BUILD_ID=64504
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/14836
JOB_NAME=SparkPullRequestBuilder
BUILD_CAUSE=GHPRBCAUSE
SPARK_SCALA_VERSION=2.11
AWS_ACCESS_KEY_ID=
NODE_LABELS=amp-jenkins-worker-02 centos spark-compile spark-test
HUDSON_URL=https://amplab.cs.berkeley.edu/jenkins/
SPARK_PREPEND_CLASSES=1
COLUMNS=80
WORKSPACE=/home/jenkins/workspace/SparkPullRequestBuilder
SPARK_TESTING=1
_=/usr/java/jdk1.8.0_60/bin/java
GIT_COMMIT=b31b82bcc9d8767561ee720c9e7192252f4fd3fc
ghprbPullId=14836
EXECUTOR_NUMBER=9
SSH_CLIENT=192.168.10.10 44762 22
HUDSON_SERVER_COOKIE=472906e9832aeb79
cat: nonexistent_file: No such file or directory
cat: nonexistent_file: No such file or directory

[jira] [Commented] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter

2016-08-25 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437588#comment-15437588
 ] 

Xin Ren commented on SPARK-17241:
-

I can work on this one :)

> SparkR spark.glm should have configurable regularization parameter
> --
>
> Key: SPARK-17241
> URL: https://issues.apache.org/jira/browse/SPARK-17241
> Project: Spark
>  Issue Type: Improvement
>Reporter: Junyang Qian
>
> Spark has configurable L2 regularization parameter for generalized linear 
> regression. It is very important to have them in SparkR so that users can run 
> ridge regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17174) Provide support for Timestamp type Column in add_months function to return HH:mm:ss

2016-08-22 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-17174:

Component/s: SQL

> Provide support for Timestamp type Column in add_months function to return 
> HH:mm:ss
> ---
>
> Key: SPARK-17174
> URL: https://issues.apache.org/jira/browse/SPARK-17174
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.0.0
>Reporter: Amit Baghel
>Priority: Minor
>
> add_months function currently supports Date types. If Column is Timestamp 
> type then it adds month to date but it doesn't return timestamp part 
> (HH:mm:ss). See the code below.
> {code}
> import java.util.Calendar
> val now = Calendar.getInstance().getTime()
> val df = sc.parallelize((0 to 3).map(i => {now.setMonth(i); (i, new 
> java.sql.Timestamp(now.getTime))}).toSeq).toDF("ID", "DateWithTS")
> df.withColumn("NewDateWithTS", add_months(df("DateWithTS"),1)).show
> {code}
> Above code gives following response. See the HH:mm:ss is missing from 
> NewDateWithTS column.
> {code}
> +---++-+
> | ID|  DateWithTS|NewDateWithTS|
> +---++-+
> |  0|2016-01-21 09:38:...|   2016-02-21|
> |  1|2016-02-21 09:38:...|   2016-03-21|
> |  2|2016-03-21 09:38:...|   2016-04-21|
> |  3|2016-04-21 09:38:...|   2016-05-21|
> +---++-+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17157) Add multiclass logistic regression SparkR Wrapper

2016-08-19 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428751#comment-15428751
 ] 

Xin Ren commented on SPARK-17157:
-

I guess a lot more ml algorithms are still missing R wrappers?

> Add multiclass logistic regression SparkR Wrapper
> -
>
> Key: SPARK-17157
> URL: https://issues.apache.org/jira/browse/SPARK-17157
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Miao Wang
>
> [SPARK-7159][ML] Add multiclass logistic regression to Spark ML  has been 
> merged to Master. I open this JIRA for discussion of adding SparkR wrapper 
> for multiclass logistic regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17133) Improvements to linear methods in Spark

2016-08-18 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427144#comment-15427144
 ] 

Xin Ren commented on SPARK-17133:
-

hi [~sethah] I'd like to help on this, please count me in. Thanks a lot :)

> Improvements to linear methods in Spark
> ---
>
> Key: SPARK-17133
> URL: https://issues.apache.org/jira/browse/SPARK-17133
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> This JIRA is for tracking several improvements that we should make to 
> Linear/Logistic regression in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17038) StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch

2016-08-17 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424017#comment-15424017
 ] 

Xin Ren commented on SPARK-17038:
-

Hi [~ozzieba] I guess you too busy to respond or submit a PR, and I'm now just 
submitting a PR, really sorry not waiting for a longer time

> StreamingSource reports metrics for lastCompletedBatch instead of 
> lastReceivedBatch
> ---
>
> Key: SPARK-17038
> URL: https://issues.apache.org/jira/browse/SPARK-17038
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Oz Ben-Ami
>Priority: Minor
>  Labels: metrics
>
> StreamingSource's lastReceivedBatch_submissionTime, 
> lastReceivedBatch_processingTimeStart, and 
> lastReceivedBatch_processingTimeEnd all use data from lastCompletedBatch 
> instead of lastReceivedBatch. In particular, this makes it impossible to 
> match lastReceivedBatch_records with a batchID/submission time.
> This is apparent when looking at StreamingSource.scala, lines 89-94.
> I would guess that just replacing Completed->Received in those lines would 
> fix the issue, but I haven't tested it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-17038) StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch

2016-08-17 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-17038:

Comment: was deleted

(was: Hi [~ozzieba] I guess you too busy to respond or submit a PR, and I'm now 
just submitting a PR, really sorry not waiting for a longer time)

> StreamingSource reports metrics for lastCompletedBatch instead of 
> lastReceivedBatch
> ---
>
> Key: SPARK-17038
> URL: https://issues.apache.org/jira/browse/SPARK-17038
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Oz Ben-Ami
>Priority: Minor
>  Labels: metrics
>
> StreamingSource's lastReceivedBatch_submissionTime, 
> lastReceivedBatch_processingTimeStart, and 
> lastReceivedBatch_processingTimeEnd all use data from lastCompletedBatch 
> instead of lastReceivedBatch. In particular, this makes it impossible to 
> match lastReceivedBatch_records with a batchID/submission time.
> This is apparent when looking at StreamingSource.scala, lines 89-94.
> I would guess that just replacing Completed->Received in those lines would 
> fix the issue, but I haven't tested it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17038) StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch

2016-08-17 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424015#comment-15424015
 ] 

Xin Ren commented on SPARK-17038:
-

Hi [~ozzieba] I guess you too busy to respond or submit a PR, and I'm now just 
submitting a PR, really sorry not waiting for a longer time

> StreamingSource reports metrics for lastCompletedBatch instead of 
> lastReceivedBatch
> ---
>
> Key: SPARK-17038
> URL: https://issues.apache.org/jira/browse/SPARK-17038
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Oz Ben-Ami
>Priority: Minor
>  Labels: metrics
>
> StreamingSource's lastReceivedBatch_submissionTime, 
> lastReceivedBatch_processingTimeStart, and 
> lastReceivedBatch_processingTimeEnd all use data from lastCompletedBatch 
> instead of lastReceivedBatch. In particular, this makes it impossible to 
> match lastReceivedBatch_records with a batchID/submission time.
> This is apparent when looking at StreamingSource.scala, lines 89-94.
> I would guess that just replacing Completed->Received in those lines would 
> fix the issue, but I haven't tested it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17038) StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch

2016-08-15 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421789#comment-15421789
 ] 

Xin Ren commented on SPARK-17038:
-

hi [~ozzieba] if you don't have time, I can just submit a quick path on this :)

> StreamingSource reports metrics for lastCompletedBatch instead of 
> lastReceivedBatch
> ---
>
> Key: SPARK-17038
> URL: https://issues.apache.org/jira/browse/SPARK-17038
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Oz Ben-Ami
>Priority: Minor
>  Labels: metrics
>
> StreamingSource's lastReceivedBatch_submissionTime, 
> lastReceivedBatch_processingTimeStart, and 
> lastReceivedBatch_processingTimeEnd all use data from lastCompletedBatch 
> instead of lastReceivedBatch. In particular, this makes it impossible to 
> match lastReceivedBatch_records with a batchID/submission time.
> This is apparent when looking at StreamingSource.scala, lines 89-94.
> I would guess that just replacing Completed->Received in those lines would 
> fix the issue, but I haven't tested it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-17026) warning msg in MulticlassMetricsSuite

2016-08-11 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren resolved SPARK-17026.
-
Resolution: Not A Problem

> warning msg in MulticlassMetricsSuite
> -
>
> Key: SPARK-17026
> URL: https://issues.apache.org/jira/browse/SPARK-17026
> Project: Spark
>  Issue Type: Improvement
>Reporter: Xin Ren
>Priority: Trivial
>
> Got warning when building:
> {code}
> [warn] 
> /home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:74:
>  value precision in class MulticlassMetrics is deprecated: Use accuracy.
> [warn]assert(math.abs(metrics.accuracy - metrics.precision) < delta)
> [warn]^
> [warn] 
> /home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:75:
>  value recall in class MulticlassMetrics is deprecated: Use accuracy.
> [warn]assert(math.abs(metrics.accuracy - metrics.recall) < delta)
> [warn]^
> [warn] 
> /home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:76:
>  value fMeasure in class MulticlassMetrics is deprecated: Use accuracy.
> [warn]assert(math.abs(metrics.accuracy - metrics.fMeasure) < delta)
> [warn]^
> {code}
> And `precision` and `recall` and `fMeasure` are all referencing to `accuracy`:
> {code}
> assert(math.abs(metrics.accuracy - metrics.precision) < delta)
> assert(math.abs(metrics.accuracy - metrics.recall) < delta)
> assert(math.abs(metrics.accuracy - metrics.fMeasure) < delta)
> {code}
> {code}
>   /**
>* Returns precision
>*/
>   @Since("1.1.0")
>   @deprecated("Use accuracy.", "2.0.0")
>   lazy val precision: Double = accuracy
>   /**
>* Returns recall
>* (equals to precision for multiclass classifier
>* because sum of all false positives is equal to sum
>* of all false negatives)
>*/
>   @Since("1.1.0")
>   @deprecated("Use accuracy.", "2.0.0")
>   lazy val recall: Double = accuracy
>   /**
>* Returns f-measure
>* (equals to precision and recall because precision equals recall)
>*/
>   @Since("1.1.0")
>   @deprecated("Use accuracy.", "2.0.0")
>   lazy val fMeasure: Double = accuracy
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17026) warning msg in MulticlassMetricsSuite

2016-08-11 Thread Xin Ren (JIRA)

Xin Ren created SPARK-17026:
---

 Summary: warning msg in MulticlassMetricsSuite
 Key: SPARK-17026
 URL: https://issues.apache.org/jira/browse/SPARK-17026
 Project: Spark
  Issue Type: Improvement
Reporter: Xin Ren
Priority: Trivial


Got warning when building:

{code}
[warn] 
/home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:74:
 value precision in class MulticlassMetrics is deprecated: Use accuracy.
[warn]assert(math.abs(metrics.accuracy - metrics.precision) < delta)
[warn]^
[warn] 
/home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:75:
 value recall in class MulticlassMetrics is deprecated: Use accuracy.
[warn]assert(math.abs(metrics.accuracy - metrics.recall) < delta)
[warn]^
[warn] 
/home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:76:
 value fMeasure in class MulticlassMetrics is deprecated: Use accuracy.
[warn]assert(math.abs(metrics.accuracy - metrics.fMeasure) < delta)
[warn]^
{code}

And `precision` and `recall` and `fMeasure` are all referencing to `accuracy`:
{code}
assert(math.abs(metrics.accuracy - metrics.precision) < delta)
assert(math.abs(metrics.accuracy - metrics.recall) < delta)
assert(math.abs(metrics.accuracy - metrics.fMeasure) < delta)
{code}

{code}
  /**
   * Returns precision
   */
  @Since("1.1.0")
  @deprecated("Use accuracy.", "2.0.0")
  lazy val precision: Double = accuracy


  /**
   * Returns recall
   * (equals to precision for multiclass classifier
   * because sum of all false positives is equal to sum
   * of all false negatives)
   */
  @Since("1.1.0")
  @deprecated("Use accuracy.", "2.0.0")
  lazy val recall: Double = accuracy


  /**
   * Returns f-measure
   * (equals to precision and recall because precision equals recall)
   */
  @Since("1.1.0")
  @deprecated("Use accuracy.", "2.0.0")
  lazy val fMeasure: Double = accuracy

{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17005) Fix warning "method tpe in trait AnnotationApi is deprecated"

2016-08-10 Thread Xin Ren (JIRA)

Xin Ren created SPARK-17005:
---

 Summary: Fix warning "method tpe in trait AnnotationApi is 
deprecated"
 Key: SPARK-17005
 URL: https://issues.apache.org/jira/browse/SPARK-17005
 Project: Spark
  Issue Type: Improvement
  Components: Examples
Affects Versions: 2.0.0
Reporter: Xin Ren
Priority: Trivial


When running module 'examples', there is warning:

{code}
[warn] 
/Users/quickmobile/workspace/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:349:
 method tpe in trait AnnotationApi is deprecated: Use `tree.tpe` instead
[warn]   case t if t.typeSymbol.annotations.exists(_.tpe =:= 
typeOf[SQLUserDefinedType]) =>
[warn]   ^
[warn] 
/Users/quickmobile/workspace/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:551:
 method tpe in trait AnnotationApi is deprecated: Use `tree.tpe` instead
[warn]   case t if t.typeSymbol.annotations.exists(_.tpe =:= 
typeOf[SQLUserDefinedType]) =>
[warn]   ^
[warn] 
/Users/quickmobile/workspace/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:647:
 method tpe in trait AnnotationApi is deprecated: Use `tree.tpe` instead
[warn]   case t if t.typeSymbol.annotations.exists(_.tpe =:= 
typeOf[SQLUserDefinedType]) =>
[warn]   ^
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17004) Fix warning "method declarations in class TypeApi is deprecated"

2016-08-10 Thread Xin Ren (JIRA)

Xin Ren created SPARK-17004:
---

 Summary: Fix warning "method declarations in class TypeApi is 
deprecated"
 Key: SPARK-17004
 URL: https://issues.apache.org/jira/browse/SPARK-17004
 Project: Spark
  Issue Type: Improvement
  Components: Examples
Affects Versions: 2.0.0
Reporter: Xin Ren
Priority: Trivial


When running module 'examples', there is warning:

{code}
[warn] 
/home/jenkins/workspace/SparkPullRequestBuilder/examples/src/main/scala/org/apache/spark/examples/mllib/AbstractParams.scala:41:
 method declarations in class TypeApi is deprecated: Use `decls` instead
[warn] val allAccessors = tpe.declarations.collect {
[warn]^
[warn] one warning found
[warn] 
/home/jenkins/workspace/SparkPullRequestBuilder/examples/src/main/scala/org/apache/spark/examples/mllib/AbstractParams.scala:41:
 method declarations in class TypeApi is deprecated: Use `decls` instead
[warn] val allAccessors = tpe.declarations.collect {
[warn] 
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16445) Multilayer Perceptron Classifier wrapper in SparkR

2016-07-26 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394654#comment-15394654
 ] 

Xin Ren edited comment on SPARK-16445 at 7/26/16 10:17 PM:
---

I'm still working on it, hopefully by end of this weekend I can submit PR :)

I just have a quick question that which parameters should be passed from R 
command?

For fit() of wrapper class, there are many parameters 
https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-ccb8590441998a896d1b74ca605b56efR62
{code}
  def fit(
  formula: String,
  data: DataFrame,
  blockSize: Int,
  layers: Array[Int],
  initialWeights: Vector,
  solver: String,
  seed: Long,
  maxIter: Int,
  tol: Double,
  stepSize: Double
 ): MultilayerPerceptronClassifierWrapper = {
{code}


And for R part, should I pass all the parameters from R command? 
https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-7ede1519b4a56647801b51af33c2dd18R461

I find in the example 
(http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier),
 only below parameters are being set, the rest are just usign default values

{code}
val trainer = new MultilayerPerceptronClassifier()
  .setLayers(layers)
  .setBlockSize(128)
  .setSeed(1234L)
  .setMaxIter(100)
{code}


was (Author: iamshrek):
I'm still working on it, hopefully by end of this weekend I can submit PR :)

I just have a quick question that which parameters should be passed from R 
command?

For fit() of wrapper class, there are many parameters 
https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-ccb8590441998a896d1b74ca605b56efR62
{code}
  def fit(
  formula: String,
  data: DataFrame,
  blockSize: Int,
  layers: Array[Int],
  initialWeights: Vector,
  solver: String,
  seed: Long,
  maxIter: Int,
  tol: Double,
  stepSize: Double
 ): MultilayerPerceptronClassifierWrapper = {
{code}


And for R part, should I pass all the parameters from R command? 
https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-7ede1519b4a56647801b51af33c2dd18R461
I find in the example 
(http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier),
 only below parameters are being set, the rest are just usign default values

{code}
val trainer = new MultilayerPerceptronClassifier()
  .setLayers(layers)
  .setBlockSize(128)
  .setSeed(1234L)
  .setMaxIter(100)
{code}

> Multilayer Perceptron Classifier wrapper in SparkR
> --
>
> Key: SPARK-16445
> URL: https://issues.apache.org/jira/browse/SPARK-16445
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, SparkR
>Reporter: Xiangrui Meng
>Assignee: Xin Ren
>
> Follow instructions in SPARK-16442 and implement multilayer perceptron 
> classifier wrapper in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16445) Multilayer Perceptron Classifier wrapper in SparkR

2016-07-26 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394654#comment-15394654
 ] 

Xin Ren commented on SPARK-16445:
-

I'm still working on it, hopefully by end of this weekend I can submit PR :)

I just have a quick question that which parameters should be passed from R 
command?

For fit() of wrapper class, there are many parameters 
https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-ccb8590441998a896d1b74ca605b56efR62
{code}
  def fit(
  formula: String,
  data: DataFrame,
  blockSize: Int,
  layers: Array[Int],
  initialWeights: Vector,
  solver: String,
  seed: Long,
  maxIter: Int,
  tol: Double,
  stepSize: Double
 ): MultilayerPerceptronClassifierWrapper = {
{code}


And for R part, should I pass all the parameters from R command? 
https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-7ede1519b4a56647801b51af33c2dd18R461
I find in the example 
(http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier),
 only below parameters are being set, the rest are just usign default values

{code}
val trainer = new MultilayerPerceptronClassifier()
  .setLayers(layers)
  .setBlockSize(128)
  .setSeed(1234L)
  .setMaxIter(100)
{code}

> Multilayer Perceptron Classifier wrapper in SparkR
> --
>
> Key: SPARK-16445
> URL: https://issues.apache.org/jira/browse/SPARK-16445
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, SparkR
>Reporter: Xiangrui Meng
>Assignee: Xin Ren
>
> Follow instructions in SPARK-16442 and implement multilayer perceptron 
> classifier wrapper in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16580) [WARN] class Accumulator in package spark is deprecated: use AccumulatorV2

2016-07-15 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380169#comment-15380169
 ] 

Xin Ren commented on SPARK-16580:
-

You are right this one is hard, it's kindof all over the place. I'd like to 
have a try, but I'm not sure I can resolve it...

I tried the modify here: 
https://github.com/keypointt/spark/commit/84db7265250eef147c8d51e539ace9f9dfc35a19

And I compiled again and warnings on "PythonRDD.scala:78" and 
"PythonRDD.scala:71" disappeared.

But for "PythonRDD.scala:873: trait AccumulatorParam" and 
"AccumulatorV2.scala:459: trait AccumulableParam", I don't know what to do with 
them, since they are supporting some legacy api call



> [WARN] class Accumulator in package spark is deprecated: use AccumulatorV2
> --
>
> Key: SPARK-16580
> URL: https://issues.apache.org/jira/browse/SPARK-16580
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Xin Ren
>Priority: Minor
>
> When I was working on the R wrapper, I found the compile warn.
> {code}
> > project mllib
> > console
> [warn] 
> /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78:
>  class Accumulator in package spark is deprecated: use AccumulatorV2
> [warn] accumulator: Accumulator[JList[Array[Byte]]])
> [warn]
> [warn] 
> /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78:
>  class Accumulator in package spark is deprecated: use AccumulatorV2
> [warn] accumulator: Accumulator[JList[Array[Byte]]])
> [warn]
> [warn] 
> /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78:
>  class Accumulator in package spark is deprecated: use AccumulatorV2
> [warn] accumulator: Accumulator[JList[Array[Byte]]])
> [warn]
> [warn] 
> /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:71:
>  class Accumulator in package spark is deprecated: use AccumulatorV2
> [warn] private[spark] case class PythonFunction(
> [warn]
> [warn] 
> /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78:
>  class Accumulator in package spark is deprecated: use AccumulatorV2
> [warn] accumulator: Accumulator[JList[Array[Byte]]])
> [warn]
> [warn] 
> /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:873:
>  trait AccumulatorParam in package spark is deprecated: use AccumulatorV2
> [warn]   extends AccumulatorParam[JList[Array[Byte]]] {
> [warn]
> [warn] 
> /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala:459:
>  trait AccumulableParam in package spark is deprecated: use AccumulatorV2
> [warn] param: org.apache.spark.AccumulableParam[R, T]) extends 
> AccumulatorV2[T, R] {
> [warn]
> [warn] 
> /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala:459:
>  trait AccumulableParam in package spark is deprecated: use AccumulatorV2
> [warn] param: org.apache.spark.AccumulableParam[R, T]) extends 
> AccumulatorV2[T, R] {
> [warn]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16580) [WARN] class Accumulator in package spark is deprecated: use AccumulatorV2

2016-07-15 Thread Xin Ren (JIRA)

Xin Ren created SPARK-16580:
---

 Summary: [WARN] class Accumulator in package spark is deprecated: 
use AccumulatorV2
 Key: SPARK-16580
 URL: https://issues.apache.org/jira/browse/SPARK-16580
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.0.0
Reporter: Xin Ren
Priority: Minor


When I was working on the R wrapper, I found the compile warn.

{code}
> project mllib
> console
[warn] 
/Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78:
 class Accumulator in package spark is deprecated: use AccumulatorV2
[warn] accumulator: Accumulator[JList[Array[Byte]]])
[warn]
[warn] 
/Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78:
 class Accumulator in package spark is deprecated: use AccumulatorV2
[warn] accumulator: Accumulator[JList[Array[Byte]]])
[warn]
[warn] 
/Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78:
 class Accumulator in package spark is deprecated: use AccumulatorV2
[warn] accumulator: Accumulator[JList[Array[Byte]]])
[warn]
[warn] 
/Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:71:
 class Accumulator in package spark is deprecated: use AccumulatorV2
[warn] private[spark] case class PythonFunction(
[warn]
[warn] 
/Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78:
 class Accumulator in package spark is deprecated: use AccumulatorV2
[warn] accumulator: Accumulator[JList[Array[Byte]]])
[warn]
[warn] 
/Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:873:
 trait AccumulatorParam in package spark is deprecated: use AccumulatorV2
[warn]   extends AccumulatorParam[JList[Array[Byte]]] {
[warn]
[warn] 
/Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala:459:
 trait AccumulableParam in package spark is deprecated: use AccumulatorV2
[warn] param: org.apache.spark.AccumulableParam[R, T]) extends 
AccumulatorV2[T, R] {
[warn]
[warn] 
/Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala:459:
 trait AccumulableParam in package spark is deprecated: use AccumulatorV2
[warn] param: org.apache.spark.AccumulableParam[R, T]) extends 
AccumulatorV2[T, R] {
[warn]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16535) pom.xml warning: "Definition of groupId is redundant, because it's inherited from the parent"

2016-07-13 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-16535:

Attachment: Screen Shot 2016-07-13 at 3.13.11 PM.png

> pom.xml warning: "Definition of groupId is redundant, because it's inherited 
> from the parent"
> -
>
> Key: SPARK-16535
> URL: https://issues.apache.org/jira/browse/SPARK-16535
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Xin Ren
>Priority: Minor
> Attachments: Screen Shot 2016-07-13 at 3.13.11 PM.png
>
>
> When I scan through the pom.xml of sub projects, I found this warning as 
> below and attached screenshot
> {code}
> Definition of groupId is redundant, because it's inherited from the parent
> {code}
> I've tried to remove some of the lines with groupId definition, and the build 
> on my local machine is still ok.
> {code}
> org.apache.spark
> {code}
> As I just find now 3.3.9 is being used in 
> Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will 
> remove the need to specify the parent version in sub modules. THIS is great 
> (in Maven 3.1).
> ref: 
> http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16535) pom.xml warning: "Definition of groupId is redundant, because it's inherited from the parent"

2016-07-13 Thread Xin Ren (JIRA)

Xin Ren created SPARK-16535:
---

 Summary: pom.xml warning: "Definition of groupId is redundant, 
because it's inherited from the parent"
 Key: SPARK-16535
 URL: https://issues.apache.org/jira/browse/SPARK-16535
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Xin Ren
Priority: Minor
 Attachments: Screen Shot 2016-07-13 at 3.13.11 PM.png

When I scan through the pom.xml of sub projects, I found this warning as below 
and attached screenshot
{code}
Definition of groupId is redundant, because it's inherited from the parent
{code}

I've tried to remove some of the lines with groupId definition, and the build 
on my local machine is still ok.
{code}
org.apache.spark
{code}

As I just find now 3.3.9 is being used in Spark 
2.x, and Maven-3 supports versionless parent elements: Maven 3 will remove the 
need to specify the parent version in sub modules. THIS is great (in Maven 3.1).

ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-12 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren closed SPARK-16437.
---

> SparkR read.df() from parquet got error: SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder"
> 
>
> Key: SPARK-16437
> URL: https://issues.apache.org/jira/browse/SPARK-16437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xin Ren
>Priority: Minor
>
> build SparkR with command
> {code}
> build/mvn -DskipTests -Psparkr package
> {code}
> start SparkR console
> {code}
> ./bin/sparkR
> {code}
> then get error
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> >
> >
> > library(SparkR)
> >
> > df <- read.df("examples/src/main/resources/users.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> >
> >
> > head(df)
> 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
> context is not a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> name favorite_color favorite_numbers
> 1 Alyssa3, 9, 15, 20
> 2Benred NULL
> {code}
> Reference
> * seems need to add a lib from slf4j to point to older version
> http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
> * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-12 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren resolved SPARK-16437.
-
Resolution: Not A Problem

> SparkR read.df() from parquet got error: SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder"
> 
>
> Key: SPARK-16437
> URL: https://issues.apache.org/jira/browse/SPARK-16437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xin Ren
>Priority: Minor
>
> build SparkR with command
> {code}
> build/mvn -DskipTests -Psparkr package
> {code}
> start SparkR console
> {code}
> ./bin/sparkR
> {code}
> then get error
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> >
> >
> > library(SparkR)
> >
> > df <- read.df("examples/src/main/resources/users.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> >
> >
> > head(df)
> 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
> context is not a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> name favorite_color favorite_numbers
> 1 Alyssa3, 9, 15, 20
> 2Benred NULL
> {code}
> Reference
> * seems need to add a lib from slf4j to point to older version
> http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
> * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-16502) Update depreciated method "ParquetFileReader" from parquet

2016-07-12 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren closed SPARK-16502.
---
Resolution: Invalid

> Update depreciated method "ParquetFileReader" from parquet
> --
>
> Key: SPARK-16502
> URL: https://issues.apache.org/jira/browse/SPARK-16502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Xin Ren
>
> During code compile, got below depreciation message. Need to update the 
> method invocation.
> {code}
> /Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
> Warning:(140, 19) java: 
> ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> Warning:(204, 19) java: 
> ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16502) Update depreciated method "ParquetFileReader" from parquet

2016-07-12 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-16502:

Description: 
During code compile, got below depreciation message. Need to update the method 
invocation.

{code}
/Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
Warning:(140, 19) java: 
ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated

Warning:(204, 19) java: 
ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
{code}

  was:
During code compile, got below depreciation message. Need to update the method 
invocation.

{code}
/Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
Warning:(140, 19) java: 
ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
Warning:(204, 19) java: 
ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
{code}


> Update depreciated method "ParquetFileReader" from parquet
> --
>
> Key: SPARK-16502
> URL: https://issues.apache.org/jira/browse/SPARK-16502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Xin Ren
>
> During code compile, got below depreciation message. Need to update the 
> method invocation.
> {code}
> /Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
> Warning:(140, 19) java: 
> ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> Warning:(204, 19) java: 
> ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16502) Update depreciated method "ParquetFileReader" from parquet

2016-07-12 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-16502:

Description: 
During code compile, got below depreciation message. Need to update the method 
invocation.

{code}
/Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
Warning:(140, 19) java: 
ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
Warning:(204, 19) java: 
ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
{code}

  was:
During code compile, got below depreciation message. Need to update the method 
invocation.

{code}
/Users/quickmobile/workspace/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
Warning:(448, 28) method listType in object ConversionPatterns is deprecated: 
see corresponding Javadoc for more information.
ConversionPatterns.listType(
   ^
Warning:(464, 28) method listType in object ConversionPatterns is deprecated: 
see corresponding Javadoc for more information.
ConversionPatterns.listType(
   ^
{code}


> Update depreciated method "ParquetFileReader" from parquet
> --
>
> Key: SPARK-16502
> URL: https://issues.apache.org/jira/browse/SPARK-16502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Xin Ren
>
> During code compile, got below depreciation message. Need to update the 
> method invocation.
> {code}
> /Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
> Warning:(140, 19) java: 
> ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> Warning:(204, 19) java: 
> ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16502) Update depreciated method "ParquetFileReader" from parquet

2016-07-12 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373494#comment-15373494
 ] 

Xin Ren commented on SPARK-16502:
-

I'm working on it.

> Update depreciated method "ParquetFileReader" from parquet
> --
>
> Key: SPARK-16502
> URL: https://issues.apache.org/jira/browse/SPARK-16502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Xin Ren
>
> During code compile, got below depreciation message. Need to update the 
> method invocation.
> {code}
> /Users/quickmobile/workspace/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
> Warning:(448, 28) method listType in object ConversionPatterns is deprecated: 
> see corresponding Javadoc for more information.
> ConversionPatterns.listType(
>^
> Warning:(464, 28) method listType in object ConversionPatterns is deprecated: 
> see corresponding Javadoc for more information.
> ConversionPatterns.listType(
>^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16502) Update depreciated method "ParquetFileReader" from parquet

2016-07-12 Thread Xin Ren (JIRA)

Xin Ren created SPARK-16502:
---

 Summary: Update depreciated method "ParquetFileReader" from parquet
 Key: SPARK-16502
 URL: https://issues.apache.org/jira/browse/SPARK-16502
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Xin Ren


During code compile, got below depreciation message. Need to update the method 
invocation.

{code}
/Users/quickmobile/workspace/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
Warning:(448, 28) method listType in object ConversionPatterns is deprecated: 
see corresponding Javadoc for more information.
ConversionPatterns.listType(
   ^
Warning:(464, 28) method listType in object ConversionPatterns is deprecated: 
see corresponding Javadoc for more information.
ConversionPatterns.listType(
   ^
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-12 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372210#comment-15372210
 ] 

Xin Ren edited comment on SPARK-16437 at 7/12/16 6:04 PM:
--

I worked on this for couple days, and I found it's not caused by Spark, but the 
parquet library "parquet-mr/parquet-hadoop".

I've debug by step, and found this error is from here: 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L820

and after digging into "parquet-hadoop", it's mostly probably because this 
library is missing the slf4j binder:
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L231

But it's technically not a bug, since Spark is using latest version of slf4j 
and parquet
{code}
1.7.16
1.8.1
{code}
and since 1.6 SLF4J is defaulting to no-operation (NOP) logger implementation, 
so should be ok.



was (Author: iamshrek):
I worked on this for couple days, and I found it's not caused by Spark, but the 
parquet library "parquet-mr/parquet-hadoop".

I've debug by step, and found this error is from here: 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L820

and after digging into "parquet-hadoop", it's mostly probably because this 
library is missing the slf4j binder:
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L231

But it's technically not a bug, since Spark is using 
{code}1.7.16{code}, and since 1.6 SLF4J is 
defaulting to no-operation (NOP) logger implementation, so should be ok.


> SparkR read.df() from parquet got error: SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder"
> 
>
> Key: SPARK-16437
> URL: https://issues.apache.org/jira/browse/SPARK-16437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xin Ren
>Priority: Minor
>
> build SparkR with command
> {code}
> build/mvn -DskipTests -Psparkr package
> {code}
> start SparkR console
> {code}
> ./bin/sparkR
> {code}
> then get error
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> >
> >
> > library(SparkR)
> >
> > df <- read.df("examples/src/main/resources/users.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> >
> >
> > head(df)
> 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
> context is not a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> name favorite_color favorite_numbers
> 1 Alyssa3, 9, 15, 20
> 2Benred NULL
> {code}
> Reference
> * seems need to add a lib from slf4j to point to older version
> http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
> * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-12 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373338#comment-15373338
 ] 

Xin Ren commented on SPARK-16437:
-

hi [~srowen], could you please have a look here?

I think the SLF4J error of this ticket is from parquet library 
"parquet-mr/parquet-hadoop", not Spark's problem.

But I still have very tiny changes on style, should I submit the PR or just 
ignore it? since just 2 lines change..
https://github.com/apache/spark/compare/master...keypointt:SPARK-16437?expand=1

thank you very much :)

> SparkR read.df() from parquet got error: SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder"
> 
>
> Key: SPARK-16437
> URL: https://issues.apache.org/jira/browse/SPARK-16437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xin Ren
>Priority: Minor
>
> build SparkR with command
> {code}
> build/mvn -DskipTests -Psparkr package
> {code}
> start SparkR console
> {code}
> ./bin/sparkR
> {code}
> then get error
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> >
> >
> > library(SparkR)
> >
> > df <- read.df("examples/src/main/resources/users.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> >
> >
> > head(df)
> 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
> context is not a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> name favorite_color favorite_numbers
> 1 Alyssa3, 9, 15, 20
> 2Benred NULL
> {code}
> Reference
> * seems need to add a lib from slf4j to point to older version
> http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
> * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-11 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372211#comment-15372211
 ] 

Xin Ren commented on SPARK-16437:
-

But I still find some minor improvements during my debugging, and will submit a 
PR tomorrow.

> SparkR read.df() from parquet got error: SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder"
> 
>
> Key: SPARK-16437
> URL: https://issues.apache.org/jira/browse/SPARK-16437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xin Ren
>Priority: Minor
>
> build SparkR with command
> {code}
> build/mvn -DskipTests -Psparkr package
> {code}
> start SparkR console
> {code}
> ./bin/sparkR
> {code}
> then get error
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> >
> >
> > library(SparkR)
> >
> > df <- read.df("examples/src/main/resources/users.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> >
> >
> > head(df)
> 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
> context is not a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> name favorite_color favorite_numbers
> 1 Alyssa3, 9, 15, 20
> 2Benred NULL
> {code}
> Reference
> * seems need to add a lib from slf4j to point to older version
> http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
> * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-11 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372210#comment-15372210
 ] 

Xin Ren commented on SPARK-16437:
-

I worked on this for couple days, and I found it's not caused by Spark, but the 
parquet library "parquet-mr/parquet-hadoop".

I've debug by step, and found this error is from here: 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L820

and after digging into "parquet-hadoop", it's mostly probably because this 
library is missing the slf4j binder:
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L231

But it's technically not a bug, since Spark is using 
{code}1.7.16{code}, and since 1.6 SLF4J is 
defaulting to no-operation (NOP) logger implementation, so should be ok.


> SparkR read.df() from parquet got error: SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder"
> 
>
> Key: SPARK-16437
> URL: https://issues.apache.org/jira/browse/SPARK-16437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xin Ren
>Priority: Minor
>
> build SparkR with command
> {code}
> build/mvn -DskipTests -Psparkr package
> {code}
> start SparkR console
> {code}
> ./bin/sparkR
> {code}
> then get error
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> >
> >
> > library(SparkR)
> >
> > df <- read.df("examples/src/main/resources/users.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> >
> >
> > head(df)
> 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
> context is not a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> name favorite_color favorite_numbers
> 1 Alyssa3, 9, 15, 20
> 2Benred NULL
> {code}
> Reference
> * seems need to add a lib from slf4j to point to older version
> http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
> * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16445) Multilayer Perceptron Classifier wrapper in SparkR

2016-07-11 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371385#comment-15371385
 ] 

Xin Ren commented on SPARK-16445:
-

great to know, I'll start on it, thanks Xiangrui 


> Multilayer Perceptron Classifier wrapper in SparkR
> --
>
> Key: SPARK-16445
> URL: https://issues.apache.org/jira/browse/SPARK-16445
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, SparkR
>Reporter: Xiangrui Meng
>Assignee: Xin Ren
>
> Follow instructions in SPARK-16442 and implement multilayer perceptron 
> classifier wrapper in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-08 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368548#comment-15368548
 ] 

Xin Ren commented on SPARK-16437:
-

It's SQL's problem I think, I'll remove the SparkR tag

> SparkR read.df() from parquet got error: SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder"
> 
>
> Key: SPARK-16437
> URL: https://issues.apache.org/jira/browse/SPARK-16437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xin Ren
>Priority: Minor
>
> build SparkR with command
> {code}
> build/mvn -DskipTests -Psparkr package
> {code}
> start SparkR console
> {code}
> ./bin/sparkR
> {code}
> then get error
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> >
> >
> > library(SparkR)
> >
> > df <- read.df("examples/src/main/resources/users.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> >
> >
> > head(df)
> 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
> context is not a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> name favorite_color favorite_numbers
> 1 Alyssa3, 9, 15, 20
> 2Benred NULL
> {code}
> Reference
> * seems need to add a lib from slf4j to point to older version
> http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
> * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-08 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-16437:

Component/s: (was: SparkR)

> SparkR read.df() from parquet got error: SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder"
> 
>
> Key: SPARK-16437
> URL: https://issues.apache.org/jira/browse/SPARK-16437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xin Ren
>Priority: Minor
>
> build SparkR with command
> {code}
> build/mvn -DskipTests -Psparkr package
> {code}
> start SparkR console
> {code}
> ./bin/sparkR
> {code}
> then get error
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> >
> >
> > library(SparkR)
> >
> > df <- read.df("examples/src/main/resources/users.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> >
> >
> > head(df)
> 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
> context is not a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> name favorite_color favorite_numbers
> 1 Alyssa3, 9, 15, 20
> 2Benred NULL
> {code}
> Reference
> * seems need to add a lib from slf4j to point to older version
> http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
> * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-08 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-16437:

Description: 
build SparkR with command
{code}
build/mvn -DskipTests -Psparkr package
{code}

start SparkR console
{code}
./bin/sparkR
{code}

then get error
{code}
 Welcome to
  __
   / __/__  ___ _/ /__
  _\ \/ _ \/ _ `/ __/  '_/
 /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
/_/


 SparkSession available as 'spark'.
>
>
> library(SparkR)
>
> df <- read.df("examples/src/main/resources/users.parquet")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
>
>
> head(df)
16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
name favorite_color favorite_numbers
1 Alyssa3, 9, 15, 20
2Benred NULL
{code}

Reference
* seems need to add a lib from slf4j to point to older version
http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
* on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder

  was:
build SparkR with command
{code}
build/mvn -DskipTests -Psparkr package
{code}

start SparkR console
{code}
./bin/sparkR
{code}

then get error
{code}
 Welcome to
  __
   / __/__  ___ _/ /__
  _\ \/ _ \/ _ `/ __/  '_/
 /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
/_/


 SparkSession available as 'spark'.
>
>
> library(SparkR)
>
> df <- read.df("examples/src/main/resources/users.parquet")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
>
>
> head(df)
16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
name favorite_color favorite_numbers
1 Alyssa3, 9, 15, 20
2Benred NULL
{code}

seems need to add a lib from slf4j
http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder


> SparkR read.df() from parquet got error: SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder"
> 
>
> Key: SPARK-16437
> URL: https://issues.apache.org/jira/browse/SPARK-16437
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, SQL
>Reporter: Xin Ren
>Priority: Minor
>
> build SparkR with command
> {code}
> build/mvn -DskipTests -Psparkr package
> {code}
> start SparkR console
> {code}
> ./bin/sparkR
> {code}
> then get error
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> >
> >
> > library(SparkR)
> >
> > df <- read.df("examples/src/main/resources/users.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> >
> >
> > head(df)
> 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
> context is not a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> name favorite_color favorite_numbers
> 1 Alyssa3, 9, 15, 20
> 2Benred NULL
> {code}
> Reference
> * seems need to add a lib from slf4j to point to older version
> http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder
> * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-08 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-16437:

Description: 
build SparkR with command
{code}
build/mvn -DskipTests -Psparkr package
{code}

start SparkR console
{code}
./bin/sparkR
{code}

then get error
{code}
 Welcome to
  __
   / __/__  ___ _/ /__
  _\ \/ _ \/ _ `/ __/  '_/
 /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
/_/


 SparkSession available as 'spark'.
>
>
> library(SparkR)
>
> df <- read.df("examples/src/main/resources/users.parquet")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
>
>
> head(df)
16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
name favorite_color favorite_numbers
1 Alyssa3, 9, 15, 20
2Benred NULL
{code}

seems need to add a lib from slf4j
http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder

  was:
start SparkR console
{code}
./bin/sparkR
{code}

then get error
{code}
 Welcome to
  __
   / __/__  ___ _/ /__
  _\ \/ _ \/ _ `/ __/  '_/
 /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
/_/


 SparkSession available as 'spark'.
>
>
> library(SparkR)
>
> df <- read.df("examples/src/main/resources/users.parquet")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
>
>
> head(df)
16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
name favorite_color favorite_numbers
1 Alyssa3, 9, 15, 20
2Benred NULL
{code}

seems need to add a lib from slf4j
http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder


> SparkR read.df() from parquet got error: SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder"
> 
>
> Key: SPARK-16437
> URL: https://issues.apache.org/jira/browse/SPARK-16437
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, SQL
>Reporter: Xin Ren
>Priority: Minor
> Fix For: 2.0.0
>
>
> build SparkR with command
> {code}
> build/mvn -DskipTests -Psparkr package
> {code}
> start SparkR console
> {code}
> ./bin/sparkR
> {code}
> then get error
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> >
> >
> > library(SparkR)
> >
> > df <- read.df("examples/src/main/resources/users.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> >
> >
> > head(df)
> 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
> context is not a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> name favorite_color favorite_numbers
> 1 Alyssa3, 9, 15, 20
> 2Benred NULL
> {code}
> seems need to add a lib from slf4j
> http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16445) Multilayer Perceptron Classifier wrapper in SparkR

2016-07-08 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368479#comment-15368479
 ] 

Xin Ren commented on SPARK-16445:
-

Hi Xiangrui, may I have a try on this one? Is there a strict deadline to hit? 
Thanks a lot

> Multilayer Perceptron Classifier wrapper in SparkR
> --
>
> Key: SPARK-16445
> URL: https://issues.apache.org/jira/browse/SPARK-16445
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, SparkR
>Reporter: Xiangrui Meng
>
> Follow instructions in SPARK-16442 and implement multilayer perceptron 
> classifier wrapper in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-08 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367310#comment-15367310
 ] 

Xin Ren commented on SPARK-16437:
-

I'm working on it :)

> SparkR read.df() from parquet got error: SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder"
> 
>
> Key: SPARK-16437
> URL: https://issues.apache.org/jira/browse/SPARK-16437
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, SQL
>Reporter: Xin Ren
>Priority: Minor
> Fix For: 2.0.0
>
>
> start SparkR console
> {code}
> ./bin/sparkR
> {code}
> then get error
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> >
> >
> > library(SparkR)
> >
> > df <- read.df("examples/src/main/resources/users.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> >
> >
> > head(df)
> 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
> context is not a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> name favorite_color favorite_numbers
> 1 Alyssa3, 9, 15, 20
> 2Benred NULL
> {code}
> seems need to add a lib from slf4j
> http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2016-07-08 Thread Xin Ren (JIRA)

Xin Ren created SPARK-16437:
---

 Summary: SparkR read.df() from parquet got error: SLF4J: Failed to 
load class "org.slf4j.impl.StaticLoggerBinder"
 Key: SPARK-16437
 URL: https://issues.apache.org/jira/browse/SPARK-16437
 Project: Spark
  Issue Type: Bug
  Components: SparkR, SQL
Reporter: Xin Ren
Priority: Minor
 Fix For: 2.0.0


start SparkR console
{code}
./bin/sparkR
{code}

then get error
{code}
 Welcome to
  __
   / __/__  ___ _/ /__
  _\ \/ _ \/ _ `/ __/  '_/
 /___/ .__/\_,_/_/ /_/\_\   version  2.0.0-SNAPSHOT
/_/


 SparkSession available as 'spark'.
>
>
> library(SparkR)
>
> df <- read.df("examples/src/main/resources/users.parquet")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
>
>
> head(df)
16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to 
context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
name favorite_color favorite_numbers
1 Alyssa3, 9, 15, 20
2Benred NULL
{code}

seems need to add a lib from slf4j
http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16381) Update SQL examples and programming guide for R language binding

2016-07-07 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366261#comment-15366261
 ] 

Xin Ren commented on SPARK-16381:
-

Oh I see, thank you so much :)

> Update SQL examples and programming guide for R language binding
> 
>
> Key: SPARK-16381
> URL: https://issues.apache.org/jira/browse/SPARK-16381
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Examples
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Xin Ren
>
> Please follow guidelines listed in this SPARK-16303 
> [comment|https://issues.apache.org/jira/browse/SPARK-16303?focusedCommentId=15362575=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15362575].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16381) Update SQL examples and programming guide for R language binding

2016-07-06 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365597#comment-15365597
 ] 

Xin Ren commented on SPARK-16381:
-

Hi Cheng, do you mind tell me where to find the RC date, or release schedule? 

I tried here 
https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel,
 but not much information found

> Update SQL examples and programming guide for R language binding
> 
>
> Key: SPARK-16381
> URL: https://issues.apache.org/jira/browse/SPARK-16381
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Examples
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Xin Ren
>
> Please follow guidelines listed in this SPARK-16303 
> [comment|https://issues.apache.org/jira/browse/SPARK-16303?focusedCommentId=15362575=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15362575].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16381) Update SQL examples and programming guide for R language binding

2016-07-06 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364664#comment-15364664
 ] 

Xin Ren commented on SPARK-16381:
-

I can work on this :)

Is there a strict deadline to finish? like need to be finished in couple days?

> Update SQL examples and programming guide for R language binding
> 
>
> Key: SPARK-16381
> URL: https://issues.apache.org/jira/browse/SPARK-16381
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Examples
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>
> Please follow guidelines listed in this SPARK-16303 
> [comment|https://issues.apache.org/jira/browse/SPARK-16303?focusedCommentId=15362575=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15362575].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16233) test_sparkSQL.R is failing

2016-07-02 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360058#comment-15360058
 ] 

Xin Ren commented on SPARK-16233:
-

Oh sorry I thought you guys would take over so I stopped working on this one.

Thanks a lot resolving this (y)

> test_sparkSQL.R is failing
> --
>
> Key: SPARK-16233
> URL: https://issues.apache.org/jira/browse/SPARK-16233
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.0.0
>Reporter: Xin Ren
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.0.0
>
>
> By running 
> {code}
> ./R/run-tests.sh 
> {code}
> Getting error:
> {code}
> xin:spark xr$ ./R/run-tests.sh
> Warning: Ignoring non-spark config property: SPARK_SCALA_VERSION=2.11
> Loading required package: methods
> Attaching package: ‘SparkR’
> The following object is masked from ‘package:testthat’:
> describe
> The following objects are masked from ‘package:stats’:
> cov, filter, lag, na.omit, predict, sd, var, window
> The following objects are masked from ‘package:base’:
> as.data.frame, colnames, colnames<-, drop, endsWith, intersect,
> rank, rbind, sample, startsWith, subset, summary, transform, union
> binary functions: ...
> functions on binary files: 
> broadcast variables: ..
> functions in client.R: .
> test functions in sparkR.R: .Re-using existing Spark Context. Call 
> sparkR.session.stop() or restart R to create a new Spark Context
> Re-using existing Spark Context. Call sparkR.session.stop() or restart R 
> to create a new Spark Context
> ...
> include an external JAR in SparkContext: Warning: Ignoring non-spark config 
> property: SPARK_SCALA_VERSION=2.11
> ..
> include R packages:
> MLlib functions: .SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> .27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: 
> Compression: SNAPPY
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet block size to 134217728
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet page size to 1048576
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet dictionary page size to 1048576
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Dictionary is on
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Validation is off
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Writer version is: PARQUET_1_0
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Maximum row group padding size is 0 bytes
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem 
> columnStore to file. allocated memory: 65,622
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 70B for [label] 
> BINARY: 1 values, 21B raw, 23B comp, 1 pages, encodings: [PLAIN, RLE, 
> BIT_PACKED]
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 87B for [terms, 
> list, element, list, element] BINARY: 2 values, 42B raw, 43B comp, 1 pages, 
> encodings: [PLAIN, RLE]
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 30B for 
> [hasIntercept] BOOLEAN: 1 values, 1B raw, 3B comp, 1 pages, encodings: 
> [PLAIN, BIT_PACKED]
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: 
> Compression: SNAPPY
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet block size to 134217728
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet page size to 1048576
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet dictionary page size to 1048576
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Dictionary is on
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Validation is off
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Writer version is: PARQUET_1_0
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Maximum row group padding size is 0 bytes
> 27-Jun-2016 1:51:26 PM INFO: 
> org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem 
> columnStore to file. allocated memory: 49
> 27-Jun-2016 1:51:26 PM INFO: 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for [labels, 
> list, element] BINARY: 3

[jira] [Comment Edited] (SPARK-16144) Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict

2016-06-29 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355648#comment-15355648
 ] 

Xin Ren edited comment on SPARK-16144 at 6/29/16 6:57 PM:
--

Sure, thanks Xiangrui :)


was (Author: iamshrek):
Sure, thank Xiangrui :)

> Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict
> -
>
> Key: SPARK-16144
> URL: https://issues.apache.org/jira/browse/SPARK-16144
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, SparkR
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Yanbo Liang
>
> After we grouped generic methods by the algorithm, it would be nice to add a 
> separate Rd for each ML generic methods, in particular, write.ml, read.ml, 
> summary, and predict and link the implementations with seealso.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16144) Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict

2016-06-29 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355648#comment-15355648
 ] 

Xin Ren commented on SPARK-16144:
-

Sure, thank Xiangrui :)

> Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict
> -
>
> Key: SPARK-16144
> URL: https://issues.apache.org/jira/browse/SPARK-16144
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, SparkR
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Yanbo Liang
>
> After we grouped generic methods by the algorithm, it would be nice to add a 
> separate Rd for each ML generic methods, in particular, write.ml, read.ml, 
> summary, and predict and link the implementations with seealso.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16233) test_sparkSQL.R is failing

2016-06-28 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353352#comment-15353352
 ] 

Xin Ren commented on SPARK-16233:
-

Actually I was just following the docs here 
https://github.com/keypointt/spark/tree/master/R#examples-unit-tests

Maybe we should update the docs here to point it out that "-Phive" could be 
needed?

{code}
build/mvn -DskipTests -Psparkr package
{code}
{code}
You can also run the unit tests for SparkR by running. You need to install the 
testthat package first:

R -e 'install.packages("testthat", repos="http://cran.us.r-project.org;)'
./R/run-tests.sh
{code}

> test_sparkSQL.R is failing
> --
>
> Key: SPARK-16233
> URL: https://issues.apache.org/jira/browse/SPARK-16233
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.0.0
>Reporter: Xin Ren
>Priority: Minor
>
> By running 
> {code}
> ./R/run-tests.sh 
> {code}
> Getting error:
> {code}
> xin:spark xr$ ./R/run-tests.sh
> Warning: Ignoring non-spark config property: SPARK_SCALA_VERSION=2.11
> Loading required package: methods
> Attaching package: ‘SparkR’
> The following object is masked from ‘package:testthat’:
> describe
> The following objects are masked from ‘package:stats’:
> cov, filter, lag, na.omit, predict, sd, var, window
> The following objects are masked from ‘package:base’:
> as.data.frame, colnames, colnames<-, drop, endsWith, intersect,
> rank, rbind, sample, startsWith, subset, summary, transform, union
> binary functions: ...
> functions on binary files: 
> broadcast variables: ..
> functions in client.R: .
> test functions in sparkR.R: .Re-using existing Spark Context. Call 
> sparkR.session.stop() or restart R to create a new Spark Context
> Re-using existing Spark Context. Call sparkR.session.stop() or restart R 
> to create a new Spark Context
> ...
> include an external JAR in SparkContext: Warning: Ignoring non-spark config 
> property: SPARK_SCALA_VERSION=2.11
> ..
> include R packages:
> MLlib functions: .SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> .27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: 
> Compression: SNAPPY
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet block size to 134217728
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet page size to 1048576
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet dictionary page size to 1048576
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Dictionary is on
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Validation is off
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Writer version is: PARQUET_1_0
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Maximum row group padding size is 0 bytes
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem 
> columnStore to file. allocated memory: 65,622
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 70B for [label] 
> BINARY: 1 values, 21B raw, 23B comp, 1 pages, encodings: [PLAIN, RLE, 
> BIT_PACKED]
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 87B for [terms, 
> list, element, list, element] BINARY: 2 values, 42B raw, 43B comp, 1 pages, 
> encodings: [PLAIN, RLE]
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 30B for 
> [hasIntercept] BOOLEAN: 1 values, 1B raw, 3B comp, 1 pages, encodings: 
> [PLAIN, BIT_PACKED]
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: 
> Compression: SNAPPY
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet block size to 134217728
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet page size to 1048576
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet dictionary page size to 1048576
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Dictionary is on
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Validation is off
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Writer version is: PARQUET_1_0
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Maximum row group padding size is 0 bytes
>

[jira] [Commented] (SPARK-16233) test_sparkSQL.R is failing

2016-06-28 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353340#comment-15353340
 ] 

Xin Ren commented on SPARK-16233:
-

this is what I used to build sparkR, should I add "-Phive"? sorry I'm new to 
this part.

{code}
build/mvn -DskipTests -Psparkr package
{code}

> test_sparkSQL.R is failing
> --
>
> Key: SPARK-16233
> URL: https://issues.apache.org/jira/browse/SPARK-16233
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.0.0
>Reporter: Xin Ren
>Priority: Minor
>
> By running 
> {code}
> ./R/run-tests.sh 
> {code}
> Getting error:
> {code}
> xin:spark xr$ ./R/run-tests.sh
> Warning: Ignoring non-spark config property: SPARK_SCALA_VERSION=2.11
> Loading required package: methods
> Attaching package: ‘SparkR’
> The following object is masked from ‘package:testthat’:
> describe
> The following objects are masked from ‘package:stats’:
> cov, filter, lag, na.omit, predict, sd, var, window
> The following objects are masked from ‘package:base’:
> as.data.frame, colnames, colnames<-, drop, endsWith, intersect,
> rank, rbind, sample, startsWith, subset, summary, transform, union
> binary functions: ...
> functions on binary files: 
> broadcast variables: ..
> functions in client.R: .
> test functions in sparkR.R: .Re-using existing Spark Context. Call 
> sparkR.session.stop() or restart R to create a new Spark Context
> Re-using existing Spark Context. Call sparkR.session.stop() or restart R 
> to create a new Spark Context
> ...
> include an external JAR in SparkContext: Warning: Ignoring non-spark config 
> property: SPARK_SCALA_VERSION=2.11
> ..
> include R packages:
> MLlib functions: .SLF4J: Failed to load class 
> "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> .27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: 
> Compression: SNAPPY
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet block size to 134217728
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet page size to 1048576
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet dictionary page size to 1048576
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Dictionary is on
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Validation is off
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Writer version is: PARQUET_1_0
> 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Maximum row group padding size is 0 bytes
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem 
> columnStore to file. allocated memory: 65,622
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 70B for [label] 
> BINARY: 1 values, 21B raw, 23B comp, 1 pages, encodings: [PLAIN, RLE, 
> BIT_PACKED]
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 87B for [terms, 
> list, element, list, element] BINARY: 2 values, 42B raw, 43B comp, 1 pages, 
> encodings: [PLAIN, RLE]
> 27-Jun-2016 1:51:25 PM INFO: 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 30B for 
> [hasIntercept] BOOLEAN: 1 values, 1B raw, 3B comp, 1 pages, encodings: 
> [PLAIN, BIT_PACKED]
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: 
> Compression: SNAPPY
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet block size to 134217728
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet page size to 1048576
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Parquet dictionary page size to 1048576
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Dictionary is on
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Validation is off
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Writer version is: PARQUET_1_0
> 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
> Maximum row group padding size is 0 bytes
> 27-Jun-2016 1:51:26 PM INFO: 
> org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem 
> columnStore to file. allocated memory: 49
> 27-Jun-2016 1:51:26 PM INFO: 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for [labels, 
> list, element] BINARY: 3 values, 50B raw, 50B comp, 1 pages,

[jira] [Commented] (SPARK-16144) Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict

2016-06-28 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352436#comment-15352436
 ] 

Xin Ren commented on SPARK-16144:
-

sorry still trying to solve the merge conflicts

should be close to finish...

> Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict
> -
>
> Key: SPARK-16144
> URL: https://issues.apache.org/jira/browse/SPARK-16144
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, SparkR
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Xin Ren
>
> After we grouped generic methods by the algorithm, it would be nice to add a 
> separate Rd for each ML generic methods, in particular, write.ml, read.ml, 
> summary, and predict and link the implementations with seealso.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16233) test_sparkSQL.R is failing

2016-06-27 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-16233:

Description: 
By running 
{code}
./R/run-tests.sh 
{code}

Getting error:
{code}
xin:spark xr$ ./R/run-tests.sh
Warning: Ignoring non-spark config property: SPARK_SCALA_VERSION=2.11
Loading required package: methods

Attaching package: ‘SparkR’

The following object is masked from ‘package:testthat’:

describe

The following objects are masked from ‘package:stats’:

cov, filter, lag, na.omit, predict, sd, var, window

The following objects are masked from ‘package:base’:

as.data.frame, colnames, colnames<-, drop, endsWith, intersect,
rank, rbind, sample, startsWith, subset, summary, transform, union

binary functions: ...
functions on binary files: 
broadcast variables: ..
functions in client.R: .
test functions in sparkR.R: .Re-using existing Spark Context. Call 
sparkR.session.stop() or restart R to create a new Spark Context
Re-using existing Spark Context. Call sparkR.session.stop() or restart R to 
create a new Spark Context
...
include an external JAR in SparkContext: Warning: Ignoring non-spark config 
property: SPARK_SCALA_VERSION=2.11
..
include R packages:
MLlib functions: .SLF4J: Failed to load class 
"org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
.27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: 
Compression: SNAPPY
27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet block size to 134217728
27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet page size to 1048576
27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet dictionary page size to 1048576
27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Dictionary is on
27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Validation is off
27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Writer version is: PARQUET_1_0
27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Maximum row group padding size is 0 bytes
27-Jun-2016 1:51:25 PM INFO: 
org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem columnStore 
to file. allocated memory: 65,622
27-Jun-2016 1:51:25 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 70B for [label] 
BINARY: 1 values, 21B raw, 23B comp, 1 pages, encodings: [PLAIN, RLE, 
BIT_PACKED]
27-Jun-2016 1:51:25 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 87B for [terms, 
list, element, list, element] BINARY: 2 values, 42B raw, 43B comp, 1 pages, 
encodings: [PLAIN, RLE]
27-Jun-2016 1:51:25 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 30B for 
[hasIntercept] BOOLEAN: 1 values, 1B raw, 3B comp, 1 pages, encodings: [PLAIN, 
BIT_PACKED]
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: 
Compression: SNAPPY
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet block size to 134217728
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet page size to 1048576
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet dictionary page size to 1048576
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Dictionary is on
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Validation is off
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Writer version is: PARQUET_1_0
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Maximum row group padding size is 0 bytes
27-Jun-2016 1:51:26 PM INFO: 
org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem columnStore 
to file. allocated memory: 49
27-Jun-2016 1:51:26 PM INFO: 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for [labels, 
list, element] BINARY: 3 values, 50B raw, 50B comp, 1 pages, encodings: [PLAIN, 
RLE]
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: 
Compression: SNAPPY
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet block size to 134217728
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet page size to 1048576
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet dictionary page size to 1048576
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Dictionary is on
27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Validation is off
27-Jun-2016 1:51:26 PM INFO:

[jira] [Commented] (SPARK-16233) test_sparkSQL.R is failing

2016-06-27 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351703#comment-15351703
 ] 

Xin Ren commented on SPARK-16233:
-

I'm working on this

> test_sparkSQL.R is failing
> --
>
> Key: SPARK-16233
> URL: https://issues.apache.org/jira/browse/SPARK-16233
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.0.0
>Reporter: Xin Ren
>Priority: Minor
>
> By running 
> {code}
> ./R/run-tests.sh 
> {code}
> Getting error:
> {code}
> 15. Error: create DataFrame from list or data.frame (@test_sparkSQL.R#277) 
> -
> java.lang.NoClassDefFoundorg/apache/spark/sql/execution/datasources/PreInsertCastAndRename$
>   at 
> org.apache.spark.sql.hive.HiveSessionState$$anon$1.(HiveSessionState.scala:69)
>   at 
> org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
>   at 
> org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
>   at 
> org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:533)
>   at 
> org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:293)
>   at org.apache.spark.sql.api.r.SQLUtils$.createDF(SQLUtils.scala:135)
>   at org.apache.spark.sql.api.r.SQLUtils.createDF(SQLUtils.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
>   at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
>   at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>   at java.lang.Thread.run(Thread.java:745)
> 1: createDataFrame(l, c("a", "b")) at 
> /Users/quickmobile/workspace/spark/R/lib/SparkR/tests/testthat/test_sparkSQL.R:277
> 2: dispatchFunc("createDataFrame(data, schema = NULL, samplingRatio = 1.0)", 
> x, ...)
> 3: f(x, ...)
> 4: callJStatic("org.apache.spark.sql.api.r.SQLUtils", "createDF", srdd, 
> schema$jobj,
>sparkSession)
> 5: invokeJava(isStatic = TRUE, className, methodName, ...)
> 6: stop(readString(conn))
> DONE 
> ===
> Execution halted
> {code}
> Cause: most probably these tests are using 'createDataFrame(sqlContext...)' 
> which is deprecated. Should update tests method invocations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For

[jira] [Created] (SPARK-16233) test_sparkSQL.R is failing

2016-06-27 Thread Xin Ren (JIRA)

Xin Ren created SPARK-16233:
---

 Summary: test_sparkSQL.R is failing
 Key: SPARK-16233
 URL: https://issues.apache.org/jira/browse/SPARK-16233
 Project: Spark
  Issue Type: Bug
  Components: SparkR, Tests
Affects Versions: 2.0.0
Reporter: Xin Ren
Priority: Minor


By running 
{code}
./R/run-tests.sh 
{code}

Getting error:
{code}
15. Error: create DataFrame from list or data.frame (@test_sparkSQL.R#277) -
java.lang.NoClassDefFoundorg/apache/spark/sql/execution/datasources/PreInsertCastAndRename$
at 
org.apache.spark.sql.hive.HiveSessionState$$anon$1.(HiveSessionState.scala:69)
at 
org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
at 
org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at 
org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:533)
at 
org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:293)
at org.apache.spark.sql.api.r.SQLUtils$.createDF(SQLUtils.scala:135)
at org.apache.spark.sql.api.r.SQLUtils.createDF(SQLUtils.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
1: createDataFrame(l, c("a", "b")) at 
/Users/quickmobile/workspace/spark/R/lib/SparkR/tests/testthat/test_sparkSQL.R:277
2: dispatchFunc("createDataFrame(data, schema = NULL, samplingRatio = 1.0)", x, 
...)
3: f(x, ...)
4: callJStatic("org.apache.spark.sql.api.r.SQLUtils", "createDF", srdd, 
schema$jobj,
   sparkSession)
5: invokeJava(isStatic = TRUE, className, methodName, ...)
6: stop(readString(conn))

DONE ===
Execution halted
{code}

Cause: most probably these tests are using 'createDataFrame(sqlContext...)' 
which is deprecated. Should update tests method invocations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16140) Group k-means method in generated doc

2016-06-23 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346776#comment-15346776
 ] 

Xin Ren commented on SPARK-16140:
-

OK, I'll target to finish it this weekend. 

Thanks for the tips, I'll keep it concise and clean.

> Group k-means method in generated doc
> -
>
> Key: SPARK-16140
> URL: https://issues.apache.org/jira/browse/SPARK-16140
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, SparkR
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Xin Ren
>  Labels: starter
>
> Follow SPARK-16107 and group the doc of spark.kmeans, predict(KM), 
> summary(KM), read/write.ml(KM) under Rd spark.kmeans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16140) Group k-means method in generated doc

2016-06-22 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345254#comment-15345254
 ] 

Xin Ren commented on SPARK-16140:
-

Maybe I can take this one as a warm-up, thanks Xiangrui :)

> Group k-means method in generated doc
> -
>
> Key: SPARK-16140
> URL: https://issues.apache.org/jira/browse/SPARK-16140
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, SparkR
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>  Labels: starter
>
> Follow SPARK-16107 and group the doc of spark.kmeans, predict(KM), 
> summary(KM), read/write.ml(KM) under Rd spark.kmeans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16147) Add package docs to packages under spark.ml

2016-06-22 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345246#comment-15345246
 ] 

Xin Ren commented on SPARK-16147:
-

hi Xiangrui, I can help on this if you need more hands. :)

> Add package docs to packages under spark.ml
> ---
>
> Key: SPARK-16147
> URL: https://issues.apache.org/jira/browse/SPARK-16147
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, MLlib
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> Some packages do not have package docs. It would improve the documentation if 
> we write a short summary for each package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15829) spark master webpage links to application UI broke when running in cluster mode

2016-06-10 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325073#comment-15325073
 ] 

Xin Ren commented on SPARK-15829:
-

sorry Andy, my bad. I'm running on port 7077 and client mode.

> spark master webpage links to application UI broke when running in cluster 
> mode
> ---
>
> Key: SPARK-15829
> URL: https://issues.apache.org/jira/browse/SPARK-15829
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.6.1
> Environment: AWS ec2 cluster
>Reporter: Andrew Davidson
>Priority: Critical
>
> Hi 
> I created a cluster using the spark-1.6.1-bin-hadoop2.6/ec2/spark-ec2
> I use the stand alone cluster manager. I have a streaming app running in 
> cluster mode. I notice the master webpage links to the application UI page 
> are incorrect
> It does not look like jira will let my upload images. I'll try and describe 
> the web pages and the bug
> My master is running on
> http://ec2-54-215-230-73.us-west-1.compute.amazonaws.com:8080/
> It has a section marked "applications". If I click on one of the running 
> application ids I am taken to a page showing "Executor Summary".  This page 
> has a link to teh 'application detail UI'  the url is 
> http://ec2-54-215-230-73.us-west-1.compute.amazonaws.com:4041/
> Notice it things the application UI is running on the cluster master.
> It is actually running on the same machine as the driver on port 4041. I was 
> able to reverse engine the url by noticing the private ip address is part of 
> the worker id . For example worker-20160322041632-172.31.23.201-34909
> next I went on the aws ec2 console to find the public DNS name for this 
> machine 
> http://ec2-54-193-104-169.us-west-1.compute.amazonaws.com:4041/streaming/
> Kind regards
> Andy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15829) spark master webpage links to application UI broke when running in cluster mode

2016-06-10 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324789#comment-15324789
 ] 

Xin Ren commented on SPARK-15829:
-

Hi Andy, maybe you want to check your port configuration to make sure the port 
is not in use.

I just tried it on my cluster which is also on EC2 with v-1.6.1, and  
'application detail UI' link is working properly.

Just for your information.

> spark master webpage links to application UI broke when running in cluster 
> mode
> ---
>
> Key: SPARK-15829
> URL: https://issues.apache.org/jira/browse/SPARK-15829
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.6.1
> Environment: AWS ec2 cluster
>Reporter: Andrew Davidson
>Priority: Critical
>
> Hi 
> I created a cluster using the spark-1.6.1-bin-hadoop2.6/ec2/spark-ec2
> I use the stand alone cluster manager. I have a streaming app running in 
> cluster mode. I notice the master webpage links to the application UI page 
> are incorrect
> It does not look like jira will let my upload images. I'll try and describe 
> the web pages and the bug
> My master is running on
> http://ec2-54-215-230-73.us-west-1.compute.amazonaws.com:8080/
> It has a section marked "applications". If I click on one of the running 
> application ids I am taken to a page showing "Executor Summary".  This page 
> has a link to teh 'application detail UI'  the url is 
> http://ec2-54-215-230-73.us-west-1.compute.amazonaws.com:4041/
> Notice it things the application UI is running on the cluster master.
> It is actually running on the same machine as the driver on port 4041. I was 
> able to reverse engine the url by noticing the private ip address is part of 
> the worker id . For example worker-20160322041632-172.31.23.201-34909
> next I went on the aws ec2 console to find the public DNS name for this 
> machine 
> http://ec2-54-193-104-169.us-west-1.compute.amazonaws.com:4041/streaming/
> Kind regards
> Andy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"

2016-05-30 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-15509:

Description: 
Currently in SparkR, when you load a LibSVM dataset using the sqlContext and 
then pass it to an MLlib algorithm, the ML wrappers will fail since they will 
try to create a "features" column, which conflicts with the existing "features" 
column from the LibSVM loader.  E.g., using the "mnist" dataset from LibSVM:

{code}
training <- loadDF(sqlContext, ".../mnist", "libsvm")
model <- spark.naiveBayes(label ~ features, training)
{code}

This fails with:
{code}
16/05/24 11:52:41 ERROR RBackendHandler: fit on 
org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.IllegalArgumentException: Output column features already exists.
at 
org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
at 
org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
at 
org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
at 
org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
{code}

The same issue appears for the "label" column once you rename the "features" 
column.

  was:
Currently in SparkR, when you load a LibSVM dataset using the sqlContext and 
then pass it to an MLlib algorithm, the ML wrappers will fail since they will 
try to create a "features" column, which conflicts with the existing "features" 
column from the LibSVM loader.  E.g., using the "mnist" dataset from LibSVM:

{code}
training <- loadDF(sqlContext, ".../mnist", "libsvm")
model <- naiveBayes(label ~ features, training)
{code}

This fails with:
{code}
16/05/24 11:52:41 ERROR RBackendHandler: fit on 
org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.IllegalArgumentException: Output column features already exists.
at 
org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
at 
org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
at 
org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
at 
org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
{code}

The same issue appears for the "label" column once you rename the "features" 
column.


> R MLlib algorithms should support input columns "features" and "label"
> --
>
> Key: SPARK-15509
> URL: https://issues.apache.org/jira/browse/SPARK-15509
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>
> Currently in SparkR, when you load a LibSVM dataset using the sqlContext and 
> then pass it to an MLlib algorithm, the ML wrappers will fail since they will 
> try to create a "features" column, which conflicts with the existing 
> "features" column from the LibSVM loader.  E.g., using the "mnist" dataset 
> from LibSVM:
> {code}
> training <- loadDF(sqlContext, ".../mnist", "libsvm")
> model <- spark.naiveBayes(label ~ features, training)
> {code}
> This fails with:
> {code}
> 16/05/24 11:52:41 ERROR RBackendHandler: fit on 
> org.apache.spark.ml.r.NaiveBayesWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.IllegalArgumentException: Output column features already exists.
>   at 
>

[jira] [Updated] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"

2016-05-30 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-15509:

Description: 
Currently in SparkR, when you load a LibSVM dataset using the sqlContext and 
then pass it to an MLlib algorithm, the ML wrappers will fail since they will 
try to create a "features" column, which conflicts with the existing "features" 
column from the LibSVM loader.  E.g., using the "mnist" dataset from LibSVM:

{code}
training <- loadDF(sqlContext, ".../mnist", "libsvm")
model <- naiveBayes(label ~ features, training)
{code}

This fails with:
{code}
16/05/24 11:52:41 ERROR RBackendHandler: fit on 
org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.IllegalArgumentException: Output column features already exists.
at 
org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
at 
org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
at 
org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
at 
org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
{code}

The same issue appears for the "label" column once you rename the "features" 
column.

  was:
Currently in SparkR, when you load a LibSVM dataset using the sqlContext and 
then pass it to an MLlib algorithm, the ML wrappers will fail since they will 
try to create a "features" column, which conflicts with the existing "features" 
column from the LibSVM loader.  E.g., using the "mnist" dataset from LibSVM:

{code}
training <- loadDF(sqlContext, ".../mnist", "libsvm")
model <- spark.naiveBayes(label ~ features, training)
{code}

This fails with:
{code}
16/05/24 11:52:41 ERROR RBackendHandler: fit on 
org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.IllegalArgumentException: Output column features already exists.
at 
org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
at 
org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
at 
org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
at 
org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
{code}

The same issue appears for the "label" column once you rename the "features" 
column.


> R MLlib algorithms should support input columns "features" and "label"
> --
>
> Key: SPARK-15509
> URL: https://issues.apache.org/jira/browse/SPARK-15509
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>
> Currently in SparkR, when you load a LibSVM dataset using the sqlContext and 
> then pass it to an MLlib algorithm, the ML wrappers will fail since they will 
> try to create a "features" column, which conflicts with the existing 
> "features" column from the LibSVM loader.  E.g., using the "mnist" dataset 
> from LibSVM:
> {code}
> training <- loadDF(sqlContext, ".../mnist", "libsvm")
> model <- naiveBayes(label ~ features, training)
> {code}
> This fails with:
> {code}
> 16/05/24 11:52:41 ERROR RBackendHandler: fit on 
> org.apache.spark.ml.r.NaiveBayesWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.IllegalArgumentException: Output column features already exists.
>   at 
>

[jira] [Commented] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"

2016-05-30 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306977#comment-15306977
 ] 

Xin Ren commented on SPARK-15509:
-

I can reproduce the error here now, sorry for bothering Joseph

> R MLlib algorithms should support input columns "features" and "label"
> --
>
> Key: SPARK-15509
> URL: https://issues.apache.org/jira/browse/SPARK-15509
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>
> Currently in SparkR, when you load a LibSVM dataset using the sqlContext and 
> then pass it to an MLlib algorithm, the ML wrappers will fail since they will 
> try to create a "features" column, which conflicts with the existing 
> "features" column from the LibSVM loader.  E.g., using the "mnist" dataset 
> from LibSVM:
> {code}
> training <- loadDF(sqlContext, ".../mnist", "libsvm")
> model <- naiveBayes(label ~ features, training)
> {code}
> This fails with:
> {code}
> 16/05/24 11:52:41 ERROR RBackendHandler: fit on 
> org.apache.spark.ml.r.NaiveBayesWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.IllegalArgumentException: Output column features already exists.
>   at 
> org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
>   at 
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
>   at 
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>   at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
>   at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
>   at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
>   at 
> org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
>   at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
> {code}
> The same issue appears for the "label" column once you rename the "features" 
> column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15645) Fix some typos of Streaming module

2016-05-29 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306092#comment-15306092
 ] 

Xin Ren commented on SPARK-15645:
-

Thank you very much for this explanation Sean, I'll try to avoid this kind of 
JIRA in the future.

> Fix some typos of Streaming module
> --
>
> Key: SPARK-15645
> URL: https://issues.apache.org/jira/browse/SPARK-15645
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Xin Ren
>Priority: Trivial
>
> No code change, just some typo fixing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-15645) Fix some typos of Streaming module

2016-05-29 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305971#comment-15305971
 ] 

Xin Ren edited comment on SPARK-15645 at 5/29/16 4:21 PM:
--

sorry...in this case what should be done for some very trivial things? Just 
open PR without a JIRA ticket?


was (Author: iamshrek):
sorry...in this case what should be done for some very trivial things?

> Fix some typos of Streaming module
> --
>
> Key: SPARK-15645
> URL: https://issues.apache.org/jira/browse/SPARK-15645
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Xin Ren
>Priority: Trivial
>
> No code change, just some typo fixing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15645) Fix some typos of Streaming module

2016-05-29 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305971#comment-15305971
 ] 

Xin Ren commented on SPARK-15645:
-

sorry...in this case what should be done for some very trivial things?

> Fix some typos of Streaming module
> --
>
> Key: SPARK-15645
> URL: https://issues.apache.org/jira/browse/SPARK-15645
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Xin Ren
>Priority: Trivial
>
> No code change, just some typo fixing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15645) Fix some typos of Streaming module

2016-05-28 Thread Xin Ren (JIRA)

Xin Ren created SPARK-15645:
---

 Summary: Fix some typos of Streaming module
 Key: SPARK-15645
 URL: https://issues.apache.org/jira/browse/SPARK-15645
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 2.0.0
Reporter: Xin Ren
Priority: Trivial


No code change, just some typo fixing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"

2016-05-27 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304777#comment-15304777
 ] 

Xin Ren commented on SPARK-15509:
-

Hi [~josephkb], I tried many times but cannot reproduce your error message here.
I tried R naiveBayes package and also spark.naiveBayes, but both got 
{code}
naiveBayes formula interface handles data frames or arrays only
{code}

below is what I did:
{code}
./bin/sparkR --master "local[2]"

> training <- loadDF(sqlContext, "data/mllib/sample_libsvm_data.txt", "libsvm")

> model <- spark.naiveBayes(label ~ features, training)
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘spark.naiveBayes’ for 
signature ‘"formula", "SparkDataFrame"’
> model <- naiveBayes(label ~ features, training)
Error in naiveBayes.formula(label ~ features, training) :
  naiveBayes formula interface handles data frames or arrays only
{code}

then I tried example here and it's working 
http://spark.apache.org/docs/latest/sparkr.html#gaussian-glm-model
{code}
df <- createDataFrame(sqlContext, iris)
model <- glm(Sepal_Length ~ Sepal_Width + Species, data = df, family = 
"gaussian")
{code}

so I compare these 2 examples, and features are 'vector' type and df above is 
normal columns.
{code}
> df
SparkDataFrame[Sepal_Length:double, Sepal_Width:double, Petal_Length:double, 
Petal_Width:double, Species:string]

> training
SparkDataFrame[label:double, features:vector]
{code}

I also downloaded "mnist" dataset LibSVM, and same error.
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#mnist

Is there anything I'm doing wrong? I'm using R package of naiveBayes 
(http://www.inside-r.org/packages/cran/e1071/docs/naivebayes), maybe I'm using 
the wrong package?

Thank you very much Joseph.

> R MLlib algorithms should support input columns "features" and "label"
> --
>
> Key: SPARK-15509
> URL: https://issues.apache.org/jira/browse/SPARK-15509
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>
> Currently in SparkR, when you load a LibSVM dataset using the sqlContext and 
> then pass it to an MLlib algorithm, the ML wrappers will fail since they will 
> try to create a "features" column, which conflicts with the existing 
> "features" column from the LibSVM loader.  E.g., using the "mnist" dataset 
> from LibSVM:
> {code}
> training <- loadDF(sqlContext, ".../mnist", "libsvm")
> model <- naiveBayes(label ~ features, training)
> {code}
> This fails with:
> {code}
> 16/05/24 11:52:41 ERROR RBackendHandler: fit on 
> org.apache.spark.ml.r.NaiveBayesWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.IllegalArgumentException: Output column features already exists.
>   at 
> org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
>   at 
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
>   at 
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>   at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
>   at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
>   at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
>   at 
> org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
>   at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
> {code}
> The same issue appears for the "label" column once you rename the "features" 
> column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"

2016-05-26 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302932#comment-15302932
 ] 

Xin Ren commented on SPARK-15509:
-

Sure I'll try to finish by end of this week, thanks Joseph

> R MLlib algorithms should support input columns "features" and "label"
> --
>
> Key: SPARK-15509
> URL: https://issues.apache.org/jira/browse/SPARK-15509
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>
> Currently in SparkR, when you load a LibSVM dataset using the sqlContext and 
> then pass it to an MLlib algorithm, the ML wrappers will fail since they will 
> try to create a "features" column, which conflicts with the existing 
> "features" column from the LibSVM loader.  E.g., using the "mnist" dataset 
> from LibSVM:
> {code}
> training <- loadDF(sqlContext, ".../mnist", "libsvm")
> model <- naiveBayes(label ~ features, training)
> {code}
> This fails with:
> {code}
> 16/05/24 11:52:41 ERROR RBackendHandler: fit on 
> org.apache.spark.ml.r.NaiveBayesWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.IllegalArgumentException: Output column features already exists.
>   at 
> org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
>   at 
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
>   at 
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>   at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
>   at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
>   at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
>   at 
> org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
>   at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
> {code}
> The same issue appears for the "label" column once you rename the "features" 
> column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15542) Make error message clear for script './R/install-dev.sh' when R is missing on Mac

2016-05-25 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren updated SPARK-15542:

Description: 
I followed instructions here https://github.com/apache/spark/tree/master/R to 
build sparkR project. When running {code}build/mvn -DskipTests -Psparkr 
package{code} then I got error below:
{code}
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [ 23.589 s]
[INFO] Spark Project Tags . SUCCESS [ 19.389 s]
#!/bin/bash
[INFO] Spark Project Sketch ... SUCCESS [  6.386 s]
[INFO] Spark Project Networking ... SUCCESS [ 12.296 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  7.817 s]
[INFO] Spark Project Unsafe ... SUCCESS [ 10.825 s]
[INFO] Spark Project Launcher . SUCCESS [ 12.262 s]
[INFO] Spark Project Core . FAILURE [01:40 min]
[INFO] Spark Project GraphX ... SKIPPED
[INFO] Spark Project Streaming  SKIPPED
[INFO] Spark Project Catalyst . SKIPPED
[INFO] Spark Project SQL .. SKIPPED
[INFO] Spark Project ML Local Library . SKIPPED
[INFO] Spark Project ML Library ... SKIPPED
[INFO] Spark Project Tools  SKIPPED
[INFO] Spark Project Hive . SKIPPED
[INFO] Spark Project REPL . SKIPPED
[INFO] Spark Project Assembly . SKIPPED
[INFO] Spark Project External Flume Sink .. SKIPPED
[INFO] Spark Project External Flume ... SKIPPED
[INFO] Spark Project External Flume Assembly .. SKIPPED
[INFO] Spark Integration for Kafka 0.8  SKIPPED
[INFO] Spark Project Examples . SKIPPED
[INFO] Spark Project External Kafka Assembly .. SKIPPED
[INFO] Spark Project Java 8 Tests . SKIPPED
[INFO] 
[INFO] BUILD FAILURE
#!/bin/bash
[INFO] 
[INFO] Total time: 03:14 min
[INFO] Finished at: 2016-05-25T21:51:58+00:00
[INFO] Final Memory: 55M/782M
[INFO] 
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec 
(sparkr-pkg) on project spark-core_2.11: Command execution failed. Process 
exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :spark-core_2.11
{code}

and this error turned to be caused by {code}./R/install-dev.sh{code}

then I directly run this install-dev.sh script, and got 
{code}
mbp185-xr:spark xin$ ./R/install-dev.sh
usage: dirname path
{code}

This message is very confusing to me, and then I found R is not properly 
configured on my Mac when this script is using {code}$(which R){code} to get R 
home.

I tried similar situation on CentOS with R missing, and it's giving me very 
clear error message while MacOS is not.

on CentOS: {code}
[root@ip-xxx-31-9-xx spark]# which R
/usr/bin/which: no R in 
(/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin:/root/bin){code}

but on Mac, if not found then nothing returned and this is causing the 
confusing message for R build failure and running R/install-dev.sh: {code}
mbp185-xr:spark xin$ which R
mbp185-xr:spark xin$
 {code}

So a more clear message needed for this miss configuration for R when running 
R/install-dev.sh.


  was:
I followed instructions here https://github.com/apache/spark/tree/master/R to 
build sparkR project. When running {code}build/mvn -DskipTests -Psparkr 
package{code} then I got error below:
{code}
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [ 23.589 s]
[INFO] Spark Project Tags . SUCCESS [ 19.389 s]
#!/bin/bash
[INFO] Spark Project Sketch ... SUCCESS [  6.386 s]
[INFO] Spark Project Networking

[jira] [Created] (SPARK-15542) Make error message clear for script './R/install-dev.sh' when R is missing on Mac

2016-05-25 Thread Xin Ren (JIRA)

Xin Ren created SPARK-15542:
---

 Summary: Make error message clear for script './R/install-dev.sh' 
when R is missing on Mac
 Key: SPARK-15542
 URL: https://issues.apache.org/jira/browse/SPARK-15542
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 2.0.0
 Environment: Mac OS EI Captain
Reporter: Xin Ren
Priority: Minor


I followed instructions here https://github.com/apache/spark/tree/master/R to 
build sparkR project. When running {code}build/mvn -DskipTests -Psparkr 
package{code} then I got error below:
{code}
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [ 23.589 s]
[INFO] Spark Project Tags . SUCCESS [ 19.389 s]
#!/bin/bash
[INFO] Spark Project Sketch ... SUCCESS [  6.386 s]
[INFO] Spark Project Networking ... SUCCESS [ 12.296 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  7.817 s]
[INFO] Spark Project Unsafe ... SUCCESS [ 10.825 s]
[INFO] Spark Project Launcher . SUCCESS [ 12.262 s]
[INFO] Spark Project Core . FAILURE [01:40 min]
[INFO] Spark Project GraphX ... SKIPPED
[INFO] Spark Project Streaming  SKIPPED
[INFO] Spark Project Catalyst . SKIPPED
[INFO] Spark Project SQL .. SKIPPED
[INFO] Spark Project ML Local Library . SKIPPED
[INFO] Spark Project ML Library ... SKIPPED
[INFO] Spark Project Tools  SKIPPED
[INFO] Spark Project Hive . SKIPPED
[INFO] Spark Project REPL . SKIPPED
[INFO] Spark Project Assembly . SKIPPED
[INFO] Spark Project External Flume Sink .. SKIPPED
[INFO] Spark Project External Flume ... SKIPPED
[INFO] Spark Project External Flume Assembly .. SKIPPED
[INFO] Spark Integration for Kafka 0.8  SKIPPED
[INFO] Spark Project Examples . SKIPPED
[INFO] Spark Project External Kafka Assembly .. SKIPPED
[INFO] Spark Project Java 8 Tests . SKIPPED
[INFO] 
[INFO] BUILD FAILURE
#!/bin/bash
[INFO] 
[INFO] Total time: 03:14 min
[INFO] Finished at: 2016-05-25T21:51:58+00:00
[INFO] Final Memory: 55M/782M
[INFO] 
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec 
(sparkr-pkg) on project spark-core_2.11: Command execution failed. Process 
exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :spark-core_2.11
{code}

and this error turned to be caused by {code}./R/install-dev.sh{code}

then I directly run this install-dev.sh script, and got 
{code}
mbp185-xr:spark quickmobile$ ./R/install-dev.sh
usage: dirname path
{code}

This message is very confusing to me, and then I found R is not properly 
configured on my Mac when this script is using {code}$(which R){code} to get R 
home.

I tried similar situation on CentOS with R missing, and it's giving me very 
clear error message while MacOS is not.

on CentOS: {code}
[root@ip-xxx-31-9-xx spark]# which R
/usr/bin/which: no R in 
(/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin:/root/bin){code}

but on Mac, if not found then nothing returned and this is causing the 
confusing message for R build failure and running R/install-dev.sh: {code}
mbp185-xr:spark xin$ which R
mbp185-xr:spark xin$
 {code}

So a more clear message needed for this miss configuration for R when running 
R/install-dev.sh.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"

2016-05-24 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298787#comment-15298787
 ] 

Xin Ren commented on SPARK-15509:
-

Hi Joseph, I'd like to try to fix this one. Thanks a lot :)

> R MLlib algorithms should support input columns "features" and "label"
> --
>
> Key: SPARK-15509
> URL: https://issues.apache.org/jira/browse/SPARK-15509
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>
> Currently in SparkR, when you load a LibSVM dataset using the sqlContext and 
> then pass it to an MLlib algorithm, the ML wrappers will fail since they will 
> try to create a "features" column, which conflicts with the existing 
> "features" column from the LibSVM loader.  E.g., using the "mnist" dataset 
> from LibSVM:
> {code}
> training <- loadDF(sqlContext, ".../mnist", "libsvm")
> model <- naiveBayes(label ~ features, training)
> {code}
> This fails with:
> {code}
> 16/05/24 11:52:41 ERROR RBackendHandler: fit on 
> org.apache.spark.ml.r.NaiveBayesWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.IllegalArgumentException: Output column features already exists.
>   at 
> org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
>   at 
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
>   at 
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>   at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
>   at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
>   at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
>   at 
> org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
>   at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
> {code}
> The same issue appears for the "label" column once you rename the "features" 
> column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15130) PySpark decision tree params should include default values to match Scala

2016-05-04 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271531#comment-15271531
 ] 

Xin Ren commented on SPARK-15130:
-

Hi I just found in class DecisionTreeClassifier of pyspark, there is a 
setParams method which sort of matches what is in scala ones.

do you mean to create a separate class "Param"?

{code}
@keyword_only
@since("1.4.0")
def setParams(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
  probabilityCol="probability", 
rawPredictionCol="rawPrediction",
  maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0,
  maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10,
  impurity="gini", seed=None):
"""
setParams(self, featuresCol="features", labelCol="label", 
predictionCol="prediction", \
  probabilityCol="probability", 
rawPredictionCol="rawPrediction", \
  maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0, \
  maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, 
impurity="gini", \
  seed=None)
Sets params for the DecisionTreeClassifier.
"""
kwargs = self.setParams._input_kwargs
return self._set(**kwargs)
{code}

> PySpark decision tree params should include default values to match Scala
> -
>
> Key: SPARK-15130
> URL: https://issues.apache.org/jira/browse/SPARK-15130
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML, PySpark
>Reporter: holdenk
>Priority: Minor
>
> As part of checking the documentation in SPARK-14813, PySpark decision tree 
> params do not include the default values (unlike the Scala ones). While the 
> existing Scala default values will have been used, this information is likely 
> worth exposing in the docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14817) ML, Graph, R 2.0 QA: Programming guide update and migration guide

2016-05-03 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269775#comment-15269775
 ] 

Xin Ren commented on SPARK-14817:
-

ok, I'll start looking for new APIs. 

So just create new tickets under SPARK-14815? 

> ML, Graph, R 2.0 QA: Programming guide update and migration guide
> -
>
> Key: SPARK-14817
> URL: https://issues.apache.org/jira/browse/SPARK-14817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib, SparkR
>Reporter: Joseph K. Bradley
>
> Before the release, we need to update the MLlib, GraphX, and SparkR 
> Programming Guides.  Updates will include:
> * Add migration guide subsection.
> ** Use the results of the QA audit JIRAs and [SPARK-13448].
> * Check phrasing, especially in main sections (for outdated items such as "In 
> this release, ...")
> For MLlib, we will make the DataFrame-based API (spark.ml) front-and-center, 
> to make it clear the RDD-based API is the older, maintenance-mode one.
> * No docs for spark.mllib will be deleted; they will just be reorganized and 
> put in a subsection.
> * If spark.ml docs are less complete, or if spark.ml docs say "refer to the 
> spark.mllib docs for details," then we should copy those details to the 
> spark.ml docs.  This per-feature work can happen under [SPARK-14815].
> * This big reorganization should be done *after* docs are added for each 
> feature (to minimize merge conflicts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14936) FlumePollingStreamSuite is slow

2016-04-28 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263129#comment-15263129
 ] 

Xin Ren commented on SPARK-14936:
-

I'm trying to fix this one now :)

> FlumePollingStreamSuite is slow
> ---
>
> Key: SPARK-14936
> URL: https://issues.apache.org/jira/browse/SPARK-14936
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Reporter: Josh Rosen
>
> FlumePollingStreamSuite contains two tests which run for a minute each. This 
> seems excessively slow and we should speed it up if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14935) DistributedSuite "local-cluster format" shouldn't actually launch clusters

2016-04-27 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260648#comment-15260648
 ] 

Xin Ren commented on SPARK-14935:
-

I'd like to have a try on this one, thanks a lot :)

> DistributedSuite "local-cluster format" shouldn't actually launch clusters
> --
>
> Key: SPARK-14935
> URL: https://issues.apache.org/jira/browse/SPARK-14935
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Reporter: Josh Rosen
>  Labels: starter
>
> In DistributedSuite, the "local-cluster format" test actually launches a 
> bunch of clusters, but this doesn't seem necessary for what should just be a 
> unit test of a regex. We should clean up the code so that this is testable 
> without actually launching a cluster, which should buy us about 20 seconds 
> per build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14817) ML 2.0 QA: Programming guide update and migration guide

2016-04-22 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254371#comment-15254371
 ] 

Xin Ren commented on SPARK-14817:
-

cout me too :)

> ML 2.0 QA: Programming guide update and migration guide
> ---
>
> Key: SPARK-14817
> URL: https://issues.apache.org/jira/browse/SPARK-14817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, MLlib
>Reporter: Joseph K. Bradley
>
> Before the release, we need to update the MLlib Programming Guide.  Updates 
> will include:
> * Make the DataFrame-based API (spark.ml) front-and-center, to make it clear 
> the RDD-based API is the older, maintenance-mode one.
> ** No docs for spark.mllib will be deleted; they will just be reorganized and 
> put in a subsection.
> ** If spark.ml docs are less complete, or if spark.ml docs say "refer to the 
> spark.mllib docs for details," then we should copy those details to the 
> spark.ml docs.
> * Add migration guide subsection.
> ** Use the results of the QA audit JIRAs.
> * Check phrasing, especially in main sections (for outdated items such as "In 
> this release, ...")
> If you would like to work on this task, please comment, and we can create & 
> link JIRAs for parts of this work (which should be broken into pieces for 
> this larger 2.0 release).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14569) Log instrumentation in KMeans

2016-04-15 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243228#comment-15243228
 ] 

Xin Ren commented on SPARK-14569:
-

Hi I'd like to have a try on this one, thanks a lot :)

> Log instrumentation in KMeans
> -
>
> Key: SPARK-14569
> URL: https://issues.apache.org/jira/browse/SPARK-14569
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Timothy Hunter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14300) Scala MLlib examples code merge and clean up

2016-03-31 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220206#comment-15220206
 ] 

Xin Ren commented on SPARK-14300:
-

ok

> Scala MLlib examples code merge and clean up
> 
>
> Key: SPARK-14300
> URL: https://issues.apache.org/jira/browse/SPARK-14300
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> Duplicated code that I found in scala/examples/mllib:
> * scala/mllib
> ** DecisionTreeRunner.scala 
> ** DenseGaussianMixture.scala
> ** DenseKMeans.scala
> ** GradientBoostedTreesRunner.scala
> ** LDAExample.scala
> ** LinearRegression.scala
> ** SparseNaiveBayes.scala
> ** StreamingLinearRegression.scala
> ** StreamingLogisticRegression.scala
> ** TallSkinnyPCA.scala
> ** TallSkinnySVD.scala
> * Unsure code duplications (need doube check)
> ** AbstractParams.scala
> ** BinaryClassification.scala
> ** Correlations.scala
> ** CosineSimilarity.scala
> ** DenseGaussianMixture.scala
> ** FPGrowthExample.scala
> ** MovieLensALS.scala
> ** MultivariateSummarizer.scala
> ** RandomRDDGeneration.scala
> ** SampledRDDs.scala
> When merging and cleaning those code, be sure not disturb the previous 
> example on and off blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14300) Scala MLlib examples code merge and clean up

2016-03-31 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220178#comment-15220178
 ] 

Xin Ren commented on SPARK-14300:
-

Hi Xusen, I can work on this one, thanks a lot :)

> Scala MLlib examples code merge and clean up
> 
>
> Key: SPARK-14300
> URL: https://issues.apache.org/jira/browse/SPARK-14300
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> Duplicated code that I found in scala/examples/mllib:
> * scala/mllib
> ** DecisionTreeRunner.scala 
> ** DenseGaussianMixture.scala
> ** DenseKMeans.scala
> ** GradientBoostedTreesRunner.scala
> ** LDAExample.scala
> ** LinearRegression.scala
> ** SparseNaiveBayes.scala
> ** StreamingLinearRegression.scala
> ** StreamingLogisticRegression.scala
> ** TallSkinnyPCA.scala
> ** TallSkinnySVD.scala
> * Unsure code duplications (need doube check)
> ** AbstractParams.scala
> ** BinaryClassification.scala
> ** Correlations.scala
> ** CosineSimilarity.scala
> ** DenseGaussianMixture.scala
> ** FPGrowthExample.scala
> ** MovieLensALS.scala
> ** MultivariateSummarizer.scala
> ** RandomRDDGeneration.scala
> ** SampledRDDs.scala
> When merging and cleaning those code, be sure not disturb the previous 
> example on and off blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-13765) method specialStateTransition(int, IntStream) is exceeding the 65535 bytes limit

2016-03-28 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren closed SPARK-13765.
---
Resolution: Not A Problem

It's caused by the way I import the whole project.

This error is popping up when I run "./build/sbt eclipse" and then directly 
import as an existing eclipse project.

When I import into Eclipse as maven project then this error is gone.

> method specialStateTransition(int, IntStream) is exceeding the 65535 bytes 
> limit
> 
>
> Key: SPARK-13765
> URL: https://issues.apache.org/jira/browse/SPARK-13765
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Eclipse-Scala IDE
> sitting on master branch
>Reporter: Xin Ren
> Attachments: Screen Shot 2016-03-08 at 9.52.48 PM.png
>
>
> Eclipse-Scala  IDE is complaining on Java Problem (*please see attached 
> screenshot*), but IntelliJ is not complaining about it.
> I'm not sure if it is a bug or not.
> {code}
> The code of method specialStateTransition(int, IntStream) is exceeding the 
> 65535 bytes limit
> SparkSqlParser_IdentifiersParser.java 
> /spark-catalyst_2.11/target/generated-sources/antlr3/org/apache/spark/sql/catalyst/parser
>line 40380
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-13019) Replace example code in mllib-statistics.md using include_example

2016-03-22 Thread Xin Ren (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Ren reopened SPARK-13019:
-

need to fix scala-2.10 compile

> Replace example code in mllib-statistics.md using include_example
> -
>
> Key: SPARK-13019
> URL: https://issues.apache.org/jira/browse/SPARK-13019
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Xusen Yin
>Assignee: Xin Ren
>Priority: Minor
>  Labels: starter
> Fix For: 2.0.0
>
>
> The example code in the user guide is embedded in the markdown and hence it 
> is not easy to test. It would be nice to automatically test them. This JIRA 
> is to discuss options to automate example code testing and see what we can do 
> in Spark 1.6.
> Goal is to move actual example code to spark/examples and test compilation in 
> Jenkins builds. Then in the markdown, we can reference part of the code to 
> show in the user guide. This requires adding a Jekyll tag that is similar to 
> https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, 
> e.g., called include_example.
> {code}{% include_example 
> scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala %}{code}
> Jekyll will find 
> `examples/src/main/scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala`
>  and pick code blocks marked "example" and replace code block in 
> {code}{% highlight %}{code}
>  in the markdown. 
> See more sub-tasks in parent ticket: 
> https://issues.apache.org/jira/browse/SPARK-11337



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13660) ContinuousQuerySuite floods the logs with garbage

2016-03-14 Thread Xin Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194085#comment-15194085
 ] 

Xin Ren commented on SPARK-13660:
-

Thank you Shixiong

> ContinuousQuerySuite floods the logs with garbage
> -
>
> Key: SPARK-13660
> URL: https://issues.apache.org/jira/browse/SPARK-13660
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Reporter: Shixiong Zhu
>  Labels: starter
>
> https://github.com/apache/spark/pull/11439 added a utility method 
> "testQuietly". We can use it for ContinuousQuerySuite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 146 matches

Mail list logo