[jira] [Commented] (SPARK-28555) Recover options and properties and pass them back into the v1 API
[ https://issues.apache.org/jira/browse/SPARK-28555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895470#comment-16895470 ] Xin Ren commented on SPARK-28555: - I'm working on it :) > Recover options and properties and pass them back into the v1 API > - > > Key: SPARK-28555 > URL: https://issues.apache.org/jira/browse/SPARK-28555 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xin Ren >Priority: Minor > > When tables are created, the {{CREATE TABLE}} syntax supports both > {{TBLPROPERTIES}} and {{OPTIONS}}. Options were used in v1 to configure the > table itself, like options passed to {{DataFrameReader}}. Right now, both > properties and options are stored in v2 table properties, because v2 only has > properties, not both. But, we aren't able to recover which properties were > set through {{OPTIONS}} and which were set through {{TBLPROPERTIES}}. > Instead of the current behavior, I think options should be prefixed with > {{option.}}. That way, we can recover options and properties and pass them > back into the v1 API. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28555) Recover options and properties and pass them back into the v1 API
[ https://issues.apache.org/jira/browse/SPARK-28555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-28555: Issue Type: Sub-task (was: Improvement) Parent: SPARK-22386 > Recover options and properties and pass them back into the v1 API > - > > Key: SPARK-28555 > URL: https://issues.apache.org/jira/browse/SPARK-28555 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xin Ren >Priority: Minor > > When tables are created, the {{CREATE TABLE}} syntax supports both > {{TBLPROPERTIES}} and {{OPTIONS}}. Options were used in v1 to configure the > table itself, like options passed to {{DataFrameReader}}. Right now, both > properties and options are stored in v2 table properties, because v2 only has > properties, not both. But, we aren't able to recover which properties were > set through {{OPTIONS}} and which were set through {{TBLPROPERTIES}}. > Instead of the current behavior, I think options should be prefixed with > {{option.}}. That way, we can recover options and properties and pass them > back into the v1 API. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28555) Recover options and properties and pass them back into the v1 API
Xin Ren created SPARK-28555: --- Summary: Recover options and properties and pass them back into the v1 API Key: SPARK-28555 URL: https://issues.apache.org/jira/browse/SPARK-28555 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Xin Ren When tables are created, the {{CREATE TABLE}} syntax supports both {{TBLPROPERTIES}} and {{OPTIONS}}. Options were used in v1 to configure the table itself, like options passed to {{DataFrameReader}}. Right now, both properties and options are stored in v2 table properties, because v2 only has properties, not both. But, we aren't able to recover which properties were set through {{OPTIONS}} and which were set through {{TBLPROPERTIES}}. Instead of the current behavior, I think options should be prefixed with {{option.}}. That way, we can recover options and properties and pass them back into the v1 API. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-28139) DataSourceV2: Add AlterTable v2 implementation
[ https://issues.apache.org/jira/browse/SPARK-28139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren closed SPARK-28139. --- > DataSourceV2: Add AlterTable v2 implementation > -- > > Key: SPARK-28139 > URL: https://issues.apache.org/jira/browse/SPARK-28139 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Fix For: 3.0.0 > > > SPARK-27857 updated the parser for v2 ALTER TABLE statements. This tracks > implementing those using a v2 catalog. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20498) RandomForestRegressionModel should expose getMaxDepth in PySpark
[ https://issues.apache.org/jira/browse/SPARK-20498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-20498: Sure please go ahead On Fri, May 26, 2017 at 12:55 AM Yan Facai (颜发才) (JIRA)> RandomForestRegressionModel should expose getMaxDepth in PySpark > > > Key: SPARK-20498 > URL: https://issues.apache.org/jira/browse/SPARK-20498 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 2.1.0 >Reporter: Nick Lothian >Assignee: Xin Ren >Priority: Minor > > Currently it isn't clear hot to get the max depth of a > RandomForestRegressionModel (eg, after doing a grid search) > It is possible to call > {{regressor._java_obj.getMaxDepth()}} > but most other decision trees allow > {{regressor.getMaxDepth()}} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19282) RandomForestRegressionModel summary should expose getMaxDepth
[ https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974188#comment-15974188 ] Xin Ren commented on SPARK-19282: - sorry [~bryanc] I'm just back from vacation... and sure I'd love to help, just let me know :) > RandomForestRegressionModel summary should expose getMaxDepth > - > > Key: SPARK-19282 > URL: https://issues.apache.org/jira/browse/SPARK-19282 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark, SparkR >Affects Versions: 2.1.0 >Reporter: Nick Lothian >Assignee: Xin Ren >Priority: Minor > > Currently it isn't clear hot to get the max depth of a > RandomForestRegressionModel (eg, after doing a grid search) > It is possible to call > {{regressor._java_obj.getMaxDepth()}} > but most other decision trees allow > {{regressor.getMaxDepth()}} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19282) RandomForestRegressionModel summary should expose getMaxDepth
[ https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974186#comment-15974186 ] Xin Ren commented on SPARK-19282: - yes, for R side, both parameters are exposed > RandomForestRegressionModel summary should expose getMaxDepth > - > > Key: SPARK-19282 > URL: https://issues.apache.org/jira/browse/SPARK-19282 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark, SparkR >Affects Versions: 2.1.0 >Reporter: Nick Lothian >Assignee: Xin Ren >Priority: Minor > > Currently it isn't clear hot to get the max depth of a > RandomForestRegressionModel (eg, after doing a grid search) > It is possible to call > {{regressor._java_obj.getMaxDepth()}} > but most other decision trees allow > {{regressor.getMaxDepth()}} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19282) RandomForestRegressionModel summary should expose getMaxDepth
[ https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924728#comment-15924728 ] Xin Ren commented on SPARK-19282: - thanks Bryan, could you please create some sub tasks under SPARK-10931? I'd like to help on it if possible > RandomForestRegressionModel summary should expose getMaxDepth > - > > Key: SPARK-19282 > URL: https://issues.apache.org/jira/browse/SPARK-19282 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark, SparkR >Affects Versions: 2.1.0 >Reporter: Nick Lothian >Assignee: Xin Ren >Priority: Minor > Fix For: 2.2.0 > > > Currently it isn't clear hot to get the max depth of a > RandomForestRegressionModel (eg, after doing a grid search) > It is possible to call > {{regressor._java_obj.getMaxDepth()}} > but most other decision trees allow > {{regressor.getMaxDepth()}} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19282) RandomForestRegressionModel summary should expose getMaxDepth
[ https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924374#comment-15924374 ] Xin Ren commented on SPARK-19282: - sure, I'm working on python part :) > RandomForestRegressionModel summary should expose getMaxDepth > - > > Key: SPARK-19282 > URL: https://issues.apache.org/jira/browse/SPARK-19282 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark, SparkR >Affects Versions: 2.1.0 >Reporter: Nick Lothian >Assignee: Xin Ren >Priority: Minor > Fix For: 2.2.0 > > > Currently it isn't clear hot to get the max depth of a > RandomForestRegressionModel (eg, after doing a grid search) > It is possible to call > {{regressor._java_obj.getMaxDepth()}} > but most other decision trees allow > {{regressor.getMaxDepth()}} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19282) RandomForestRegressionModel summary should expose getMaxDepth
[ https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923701#comment-15923701 ] Xin Ren commented on SPARK-19282: - Hi Nick, just double check I understand you correctly, you'd like to expose parameter `maxDepth` for python module too, right? > RandomForestRegressionModel summary should expose getMaxDepth > - > > Key: SPARK-19282 > URL: https://issues.apache.org/jira/browse/SPARK-19282 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark, SparkR >Affects Versions: 2.1.0 >Reporter: Nick Lothian >Assignee: Xin Ren >Priority: Minor > Fix For: 2.2.0 > > > Currently it isn't clear hot to get the max depth of a > RandomForestRegressionModel (eg, after doing a grid search) > It is possible to call > {{regressor._java_obj.getMaxDepth()}} > but most other decision trees allow > {{regressor.getMaxDepth()}} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19866) Add local version of Word2Vec findSynonyms for spark.ml: Python API
[ https://issues.apache.org/jira/browse/SPARK-19866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902522#comment-15902522 ] Xin Ren commented on SPARK-19866: - I can try this one :) > Add local version of Word2Vec findSynonyms for spark.ml: Python API > --- > > Key: SPARK-19866 > URL: https://issues.apache.org/jira/browse/SPARK-19866 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley >Priority: Minor > > Add Python API for findSynonymsArray matching Scala API in linked JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19282) RandomForestRegressionModel should expose getMaxDepth
[ https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858744#comment-15858744 ] Xin Ren commented on SPARK-19282: - I just got approved by my company to work on this one resuming my work on this task :) > RandomForestRegressionModel should expose getMaxDepth > - > > Key: SPARK-19282 > URL: https://issues.apache.org/jira/browse/SPARK-19282 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Nick Lothian >Priority: Minor > > Currently it isn't clear hot to get the max depth of a > RandomForestRegressionModel (eg, after doing a grid search) > It is possible to call > {{regressor._java_obj.getMaxDepth()}} > but most other decision trees allow > {{regressor.getMaxDepth()}} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19282) RandomForestRegressionModel should expose getMaxDepth
[ https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833973#comment-15833973 ] Xin Ren commented on SPARK-19282: - sorry Nick, now I cannot make it for this fix. anyone else please take a look? thanks a lot > RandomForestRegressionModel should expose getMaxDepth > - > > Key: SPARK-19282 > URL: https://issues.apache.org/jira/browse/SPARK-19282 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Nick Lothian >Priority: Minor > > Currently it isn't clear hot to get the max depth of a > RandomForestRegressionModel (eg, after doing a grid search) > It is possible to call > {{regressor._java_obj.getMaxDepth()}} > but most other decision trees allow > {{regressor.getMaxDepth()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19282) RandomForestRegressionModel should expose getMaxDepth
[ https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832022#comment-15832022 ] Xin Ren commented on SPARK-19282: - Thank you Nick. I'll give it a try to fix it. :) > RandomForestRegressionModel should expose getMaxDepth > - > > Key: SPARK-19282 > URL: https://issues.apache.org/jira/browse/SPARK-19282 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Nick Lothian >Priority: Minor > > Currently it isn't clear hot to get the max depth of a > RandomForestRegressionModel (eg, after doing a grid search) > It is possible to call > {{regressor._java_obj.getMaxDepth()}} > but most other decision trees allow > {{regressor.getMaxDepth()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19282) RandomForestRegressionModel should expose getMaxDepth
[ https://issues.apache.org/jira/browse/SPARK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831342#comment-15831342 ] Xin Ren commented on SPARK-19282: - sorry being naive, I'm not familiar with random forest, but is "max depth" an important metrics/param of RF model? > RandomForestRegressionModel should expose getMaxDepth > - > > Key: SPARK-19282 > URL: https://issues.apache.org/jira/browse/SPARK-19282 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Nick Lothian >Priority: Minor > > Currently it isn't clear hot to get the max depth of a > RandomForestRegressionModel (eg, after doing a grid search) > It is possible to call > {{regressor._java_obj.getMaxDepth()}} > but most other decision trees allow > {{regressor.getMaxDepth()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18907) Fix flaky test: o.a.s.sql.streaming.FileStreamSourceSuite max files per trigger - incorrect values
[ https://issues.apache.org/jira/browse/SPARK-18907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826924#comment-15826924 ] Xin Ren commented on SPARK-18907: - thanks Shixiong :P > Fix flaky test: o.a.s.sql.streaming.FileStreamSourceSuite max files per > trigger - incorrect values > -- > > Key: SPARK-18907 > URL: https://issues.apache.org/jira/browse/SPARK-18907 > Project: Spark > Issue Type: Test > Components: Structured Streaming, Tests >Reporter: Shixiong Zhu >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18907) Fix flaky test: o.a.s.sql.streaming.FileStreamSourceSuite max files per trigger - incorrect values
[ https://issues.apache.org/jira/browse/SPARK-18907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826909#comment-15826909 ] Xin Ren commented on SPARK-18907: - Hi Shixiong, what do you mean by flaky? to intercept these Exceptions? not sure about the expected outcome of this test case, thanks a lot. {code} 13:54:38.697 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13:54:50.257 ERROR org.apache.spark.sql.execution.streaming.StreamExecution: Query maxFilesPerTrigger_test [id = 118d8397-3dab-49f4-a1e7-eb8ec7dd4fc2, runId = ae622c2d-eb0e-4647-beca-907b5dac59b0] terminated with error java.lang.IllegalArgumentException: Invalid value 'not-a-integer' for option 'maxFilesPerTrigger', must be a positive integer at org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2$$anonfun$apply$3.apply(FileStreamOptions.scala:35) at org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2$$anonfun$apply$3.apply(FileStreamOptions.scala:35) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2.apply(FileStreamOptions.scala:34) at org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2.apply(FileStreamOptions.scala:33) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.execution.streaming.FileStreamOptions.(FileStreamOptions.scala:33) at org.apache.spark.sql.execution.streaming.FileStreamOptions.(FileStreamOptions.scala:31) at org.apache.spark.sql.execution.streaming.FileStreamSource.(FileStreamSource.scala:44) at org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:256) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:140) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:136) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277) at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan$lzycompute(StreamExecution.scala:136) at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan(StreamExecution.scala:131) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:246) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:186) 13:54:52.364 ERROR org.apache.spark.sql.execution.streaming.StreamExecution: Query maxFilesPerTrigger_test [id = 6c42063d-39f4-4722-b529-fc3d379c691d, runId = df0c4fae-49db-4be3-8270-662fe0947559] terminated with error java.lang.IllegalArgumentException: Invalid value '-1' for option 'maxFilesPerTrigger', must be a positive integer at org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2$$anonfun$apply$3.apply(FileStreamOptions.scala:35) at org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2$$anonfun$apply$3.apply(FileStreamOptions.scala:35) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2.apply(FileStreamOptions.scala:34) at org.apache.spark.sql.execution.streaming.FileStreamOptions$$anonfun$2.apply(FileStreamOptions.scala:33) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.execution.streaming.FileStreamOptions.(FileStreamOptions.scala:33) at org.apache.spark.sql.execution.streaming.FileStreamOptions.(FileStreamOptions.scala:31) at org.apache.spark.sql.execution.streaming.FileStreamSource.(FileStreamSource.scala:44) at org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:256) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:140) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:136) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at
[jira] [Commented] (SPARK-17724) Unevaluated new lines in tooltip in DAG Visualization of a job
[ https://issues.apache.org/jira/browse/SPARK-17724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543994#comment-15543994 ] Xin Ren commented on SPARK-17724: - I can give it a try > Unevaluated new lines in tooltip in DAG Visualization of a job > -- > > Key: SPARK-17724 > URL: https://issues.apache.org/jira/browse/SPARK-17724 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.1.0 >Reporter: Jacek Laskowski >Priority: Minor > Attachments: > spark-webui-job-details-dagvisualization-newlines-broken.png > > > The tooltips in DAG Visualization for a job show new lines verbatim > (unevaluated). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17628) Name of "object StreamingExamples" should be more self-explanatory
Xin Ren created SPARK-17628: --- Summary: Name of "object StreamingExamples" should be more self-explanatory Key: SPARK-17628 URL: https://issues.apache.org/jira/browse/SPARK-17628 Project: Spark Issue Type: Bug Components: Examples, Streaming Affects Versions: 2.0.0 Reporter: Xin Ren Priority: Minor `object StreamingExamples` is more of a utility object, and the name is too general and I thought it's an actual streaming example at the very beginning. {code} /** Utility functions for Spark Streaming examples. */ object StreamingExamples extends Logging { /** Set reasonable logging levels for streaming if the user has not configured log4j. */ def setStreamingLogLevels() { val log4jInitialized = Logger.getRootLogger.getAllAppenders.hasMoreElements if (!log4jInitialized) { // We first log something to initialize Spark's default logging, then we override the // logging level. logInfo("Setting log level to [WARN] for streaming example." + " To override add a custom log4j.properties to the classpath.") Logger.getRootLogger.setLevel(Level.WARN) } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17476) Proper handling for unseen labels in logistic regression training.
[ https://issues.apache.org/jira/browse/SPARK-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478517#comment-15478517 ] Xin Ren commented on SPARK-17476: - Hi I can try to work on this one, thanks :) > Proper handling for unseen labels in logistic regression training. > -- > > Key: SPARK-17476 > URL: https://issues.apache.org/jira/browse/SPARK-17476 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson > > Now that logistic regression supports multiclass, it is possible to train on > data that has {{K}} classes, but one or more of the classes does not appear > in training. For example, > {code} > (0.0, x1) > (2.0, x2) > ... > {code} > Currently, logistic regression assumes that the outcome classes in the above > dataset have three levels: {{0, 1, 2}}. Since label 1 never appears, it > should never be predicted. In theory, the coefficients should be zero and the > intercept should be negative infinity. This can cause problems since we > center the intercepts after training. > We should discuss whether or not the intercepts actually tend to -infinity in > practice, and whether or not we should even include them in training. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17276) Stop environment parameters flooding Jenkins build output
[ https://issues.apache.org/jira/browse/SPARK-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-17276: Attachment: Screen Shot 2016-08-26 at 10.52.07 PM.png > Stop environment parameters flooding Jenkins build output > - > > Key: SPARK-17276 > URL: https://issues.apache.org/jira/browse/SPARK-17276 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Tests >Affects Versions: 2.0.0 >Reporter: Xin Ren >Priority: Minor > Attachments: Screen Shot 2016-08-26 at 10.52.07 PM.png > > > When I was trying to find error msg in a failed Jenkins build job, annoyed by > the huge env output. > The env parameter output should be muted. > {code} > [info] PipedRDDSuite: > [info] - basic pipe (51 milliseconds) > 0 0 0 > [info] - basic pipe with tokenization (60 milliseconds) > [info] - failure in iterating over pipe input (49 milliseconds) > [info] - advanced pipe (100 milliseconds) > [info] - pipe with empty partition (117 milliseconds) > PATH=/home/anaconda/envs/py3k/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:/usr/java/jdk1.8.0_60/bin:/home/jenkins/tools/hudson.model.JDK/JDK_7u60/bin:/home/jenkins/.cargo/bin:/home/anaconda/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/:/home/android-sdk/:/usr/local/bin:/bin:/usr/bin:/home/anaconda/envs/py3k/bin > BUILD_CAUSE_GHPRBCAUSE=true > SBT_MAVEN_PROFILES=-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl > -Phive-thriftserver > HUDSON_HOME=/var/lib/jenkins > AWS_SECRET_ACCESS_KEY= > JOB_URL=https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ > HUDSON_COOKIE=638da3d2-d27a-4724-b41a-5ff6e8ce6752 > LINES=24 > CURRENT_BLOCK=18 > ANDROID_HOME=/home/android-sdk/ > ghprbActualCommit=70a751c6959048e65c083ab775b01523da4578a2 > ghprbSourceBranch=codeWalkThroughML > GITHUB_OAUTH_KEY= > MAIL=/var/mail/jenkins > AMPLAB_JENKINS=1 > JENKINS_SERVER_COOKIE=472906e9832aeb79 > ghprbPullTitle=[MINOR][MLlib][SQL] Clean up unused variables and unused import > LOGNAME=jenkins > PWD=/home/jenkins/workspace/SparkPullRequestBuilder > JENKINS_URL=https://amplab.cs.berkeley.edu/jenkins/ > SPARK_VERSIONS_SUITE_IVY_PATH=/home/sparkivy/per-executor-caches/9/.ivy2 > ROOT_BUILD_CAUSE_GHPRBCAUSE=true > ghprbActualCommitAuthorEmail=iamsh...@126.com > ghprbTargetBranch=master > BUILD_TAG=jenkins-SparkPullRequestBuilder-64504 > SHELL=/bin/bash > ROOT_BUILD_CAUSE=GHPRBCAUSE > SBT_OPTS=-Duser.home=/home/sparkivy/per-executor-caches/9 > -Dsbt.ivy.home=/home/sparkivy/per-executor-caches/9/.ivy2 > JENKINS_HOME=/var/lib/jenkins > sha1=origin/pr/14836/merge > ghprbPullDescription=GitHub pull request #14836 of commit > 70a751c6959048e65c083ab775b01523da4578a2 automatically merged. > NODE_NAME=amp-jenkins-worker-02 > BUILD_DISPLAY_NAME=#64504 > JAVA_7_HOME=/usr/java/jdk1.7.0_79 > GIT_BRANCH=codeWalkThroughML > SHLVL=3 > AMP_JENKINS_PRB=true > JAVA_HOME=/usr/java/jdk1.8.0_60 > JENKINS_MASTER_HOSTNAME=amp-jenkins-master > BUILD_ID=64504 > XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt > ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/14836 > JOB_NAME=SparkPullRequestBuilder > BUILD_CAUSE=GHPRBCAUSE > SPARK_SCALA_VERSION=2.11 > AWS_ACCESS_KEY_ID= > NODE_LABELS=amp-jenkins-worker-02 centos spark-compile spark-test > HUDSON_URL=https://amplab.cs.berkeley.edu/jenkins/ > SPARK_PREPEND_CLASSES=1 > COLUMNS=80 > WORKSPACE=/home/jenkins/workspace/SparkPullRequestBuilder > SPARK_TESTING=1 > _=/usr/java/jdk1.8.0_60/bin/java > GIT_COMMIT=b31b82bcc9d8767561ee720c9e7192252f4fd3fc > ghprbPullId=14836 > EXECUTOR_NUMBER=9 > SSH_CLIENT=192.168.10.10 44762 22 > HUDSON_SERVER_COOKIE=472906e9832aeb79 > cat: nonexistent_file: No such file or directory > cat: nonexistent_file: No such file or directory >
[jira] [Commented] (SPARK-17276) Stop environment parameters flooding Jenkins build output
[ https://issues.apache.org/jira/browse/SPARK-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440782#comment-15440782 ] Xin Ren commented on SPARK-17276: - I'm working on it. > Stop environment parameters flooding Jenkins build output > - > > Key: SPARK-17276 > URL: https://issues.apache.org/jira/browse/SPARK-17276 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Tests >Affects Versions: 2.0.0 >Reporter: Xin Ren >Priority: Minor > Attachments: Screen Shot 2016-08-26 at 10.52.07 PM.png > > > When I was trying to find error msg in a failed Jenkins build job, annoyed by > the huge env output. > The env parameter output should be muted. > {code} > [info] PipedRDDSuite: > [info] - basic pipe (51 milliseconds) > 0 0 0 > [info] - basic pipe with tokenization (60 milliseconds) > [info] - failure in iterating over pipe input (49 milliseconds) > [info] - advanced pipe (100 milliseconds) > [info] - pipe with empty partition (117 milliseconds) > PATH=/home/anaconda/envs/py3k/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:/usr/java/jdk1.8.0_60/bin:/home/jenkins/tools/hudson.model.JDK/JDK_7u60/bin:/home/jenkins/.cargo/bin:/home/anaconda/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/:/home/android-sdk/:/usr/local/bin:/bin:/usr/bin:/home/anaconda/envs/py3k/bin > BUILD_CAUSE_GHPRBCAUSE=true > SBT_MAVEN_PROFILES=-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl > -Phive-thriftserver > HUDSON_HOME=/var/lib/jenkins > AWS_SECRET_ACCESS_KEY= > JOB_URL=https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ > HUDSON_COOKIE=638da3d2-d27a-4724-b41a-5ff6e8ce6752 > LINES=24 > CURRENT_BLOCK=18 > ANDROID_HOME=/home/android-sdk/ > ghprbActualCommit=70a751c6959048e65c083ab775b01523da4578a2 > ghprbSourceBranch=codeWalkThroughML > GITHUB_OAUTH_KEY= > MAIL=/var/mail/jenkins > AMPLAB_JENKINS=1 > JENKINS_SERVER_COOKIE=472906e9832aeb79 > ghprbPullTitle=[MINOR][MLlib][SQL] Clean up unused variables and unused import > LOGNAME=jenkins > PWD=/home/jenkins/workspace/SparkPullRequestBuilder > JENKINS_URL=https://amplab.cs.berkeley.edu/jenkins/ > SPARK_VERSIONS_SUITE_IVY_PATH=/home/sparkivy/per-executor-caches/9/.ivy2 > ROOT_BUILD_CAUSE_GHPRBCAUSE=true > ghprbActualCommitAuthorEmail=iamsh...@126.com > ghprbTargetBranch=master > BUILD_TAG=jenkins-SparkPullRequestBuilder-64504 > SHELL=/bin/bash > ROOT_BUILD_CAUSE=GHPRBCAUSE > SBT_OPTS=-Duser.home=/home/sparkivy/per-executor-caches/9 > -Dsbt.ivy.home=/home/sparkivy/per-executor-caches/9/.ivy2 > JENKINS_HOME=/var/lib/jenkins > sha1=origin/pr/14836/merge > ghprbPullDescription=GitHub pull request #14836 of commit > 70a751c6959048e65c083ab775b01523da4578a2 automatically merged. > NODE_NAME=amp-jenkins-worker-02 > BUILD_DISPLAY_NAME=#64504 > JAVA_7_HOME=/usr/java/jdk1.7.0_79 > GIT_BRANCH=codeWalkThroughML > SHLVL=3 > AMP_JENKINS_PRB=true > JAVA_HOME=/usr/java/jdk1.8.0_60 > JENKINS_MASTER_HOSTNAME=amp-jenkins-master > BUILD_ID=64504 > XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt > ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/14836 > JOB_NAME=SparkPullRequestBuilder > BUILD_CAUSE=GHPRBCAUSE > SPARK_SCALA_VERSION=2.11 > AWS_ACCESS_KEY_ID= > NODE_LABELS=amp-jenkins-worker-02 centos spark-compile spark-test > HUDSON_URL=https://amplab.cs.berkeley.edu/jenkins/ > SPARK_PREPEND_CLASSES=1 > COLUMNS=80 > WORKSPACE=/home/jenkins/workspace/SparkPullRequestBuilder > SPARK_TESTING=1 > _=/usr/java/jdk1.8.0_60/bin/java > GIT_COMMIT=b31b82bcc9d8767561ee720c9e7192252f4fd3fc > ghprbPullId=14836 > EXECUTOR_NUMBER=9 > SSH_CLIENT=192.168.10.10 44762 22 > HUDSON_SERVER_COOKIE=472906e9832aeb79 > cat: nonexistent_file: No such file or directory > cat: nonexistent_file: No such file or directory >
[jira] [Created] (SPARK-17276) Stop environment parameters flooding Jenkins build output
Xin Ren created SPARK-17276: --- Summary: Stop environment parameters flooding Jenkins build output Key: SPARK-17276 URL: https://issues.apache.org/jira/browse/SPARK-17276 Project: Spark Issue Type: Improvement Components: Spark Core, Tests Affects Versions: 2.0.0 Reporter: Xin Ren Priority: Minor When I was trying to find error msg in a failed Jenkins build job, annoyed by the huge env output. The env parameter output should be muted. {code} [info] PipedRDDSuite: [info] - basic pipe (51 milliseconds) 0 0 0 [info] - basic pipe with tokenization (60 milliseconds) [info] - failure in iterating over pipe input (49 milliseconds) [info] - advanced pipe (100 milliseconds) [info] - pipe with empty partition (117 milliseconds) PATH=/home/anaconda/envs/py3k/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:/usr/java/jdk1.8.0_60/bin:/home/jenkins/tools/hudson.model.JDK/JDK_7u60/bin:/home/jenkins/.cargo/bin:/home/anaconda/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/:/home/android-sdk/:/usr/local/bin:/bin:/usr/bin:/home/anaconda/envs/py3k/bin BUILD_CAUSE_GHPRBCAUSE=true SBT_MAVEN_PROFILES=-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl -Phive-thriftserver HUDSON_HOME=/var/lib/jenkins AWS_SECRET_ACCESS_KEY= JOB_URL=https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ HUDSON_COOKIE=638da3d2-d27a-4724-b41a-5ff6e8ce6752 LINES=24 CURRENT_BLOCK=18 ANDROID_HOME=/home/android-sdk/ ghprbActualCommit=70a751c6959048e65c083ab775b01523da4578a2 ghprbSourceBranch=codeWalkThroughML GITHUB_OAUTH_KEY= MAIL=/var/mail/jenkins AMPLAB_JENKINS=1 JENKINS_SERVER_COOKIE=472906e9832aeb79 ghprbPullTitle=[MINOR][MLlib][SQL] Clean up unused variables and unused import LOGNAME=jenkins PWD=/home/jenkins/workspace/SparkPullRequestBuilder JENKINS_URL=https://amplab.cs.berkeley.edu/jenkins/ SPARK_VERSIONS_SUITE_IVY_PATH=/home/sparkivy/per-executor-caches/9/.ivy2 ROOT_BUILD_CAUSE_GHPRBCAUSE=true ghprbActualCommitAuthorEmail=iamsh...@126.com ghprbTargetBranch=master BUILD_TAG=jenkins-SparkPullRequestBuilder-64504 SHELL=/bin/bash ROOT_BUILD_CAUSE=GHPRBCAUSE SBT_OPTS=-Duser.home=/home/sparkivy/per-executor-caches/9 -Dsbt.ivy.home=/home/sparkivy/per-executor-caches/9/.ivy2 JENKINS_HOME=/var/lib/jenkins sha1=origin/pr/14836/merge ghprbPullDescription=GitHub pull request #14836 of commit 70a751c6959048e65c083ab775b01523da4578a2 automatically merged. NODE_NAME=amp-jenkins-worker-02 BUILD_DISPLAY_NAME=#64504 JAVA_7_HOME=/usr/java/jdk1.7.0_79 GIT_BRANCH=codeWalkThroughML SHLVL=3 AMP_JENKINS_PRB=true JAVA_HOME=/usr/java/jdk1.8.0_60 JENKINS_MASTER_HOSTNAME=amp-jenkins-master BUILD_ID=64504 XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/14836 JOB_NAME=SparkPullRequestBuilder BUILD_CAUSE=GHPRBCAUSE SPARK_SCALA_VERSION=2.11 AWS_ACCESS_KEY_ID= NODE_LABELS=amp-jenkins-worker-02 centos spark-compile spark-test HUDSON_URL=https://amplab.cs.berkeley.edu/jenkins/ SPARK_PREPEND_CLASSES=1 COLUMNS=80 WORKSPACE=/home/jenkins/workspace/SparkPullRequestBuilder SPARK_TESTING=1 _=/usr/java/jdk1.8.0_60/bin/java GIT_COMMIT=b31b82bcc9d8767561ee720c9e7192252f4fd3fc ghprbPullId=14836 EXECUTOR_NUMBER=9 SSH_CLIENT=192.168.10.10 44762 22 HUDSON_SERVER_COOKIE=472906e9832aeb79 cat: nonexistent_file: No such file or directory cat: nonexistent_file: No such file or directory
[jira] [Commented] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter
[ https://issues.apache.org/jira/browse/SPARK-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437588#comment-15437588 ] Xin Ren commented on SPARK-17241: - I can work on this one :) > SparkR spark.glm should have configurable regularization parameter > -- > > Key: SPARK-17241 > URL: https://issues.apache.org/jira/browse/SPARK-17241 > Project: Spark > Issue Type: Improvement >Reporter: Junyang Qian > > Spark has configurable L2 regularization parameter for generalized linear > regression. It is very important to have them in SparkR so that users can run > ridge regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17174) Provide support for Timestamp type Column in add_months function to return HH:mm:ss
[ https://issues.apache.org/jira/browse/SPARK-17174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-17174: Component/s: SQL > Provide support for Timestamp type Column in add_months function to return > HH:mm:ss > --- > > Key: SPARK-17174 > URL: https://issues.apache.org/jira/browse/SPARK-17174 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 2.0.0 >Reporter: Amit Baghel >Priority: Minor > > add_months function currently supports Date types. If Column is Timestamp > type then it adds month to date but it doesn't return timestamp part > (HH:mm:ss). See the code below. > {code} > import java.util.Calendar > val now = Calendar.getInstance().getTime() > val df = sc.parallelize((0 to 3).map(i => {now.setMonth(i); (i, new > java.sql.Timestamp(now.getTime))}).toSeq).toDF("ID", "DateWithTS") > df.withColumn("NewDateWithTS", add_months(df("DateWithTS"),1)).show > {code} > Above code gives following response. See the HH:mm:ss is missing from > NewDateWithTS column. > {code} > +---++-+ > | ID| DateWithTS|NewDateWithTS| > +---++-+ > | 0|2016-01-21 09:38:...| 2016-02-21| > | 1|2016-02-21 09:38:...| 2016-03-21| > | 2|2016-03-21 09:38:...| 2016-04-21| > | 3|2016-04-21 09:38:...| 2016-05-21| > +---++-+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17157) Add multiclass logistic regression SparkR Wrapper
[ https://issues.apache.org/jira/browse/SPARK-17157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428751#comment-15428751 ] Xin Ren commented on SPARK-17157: - I guess a lot more ml algorithms are still missing R wrappers? > Add multiclass logistic regression SparkR Wrapper > - > > Key: SPARK-17157 > URL: https://issues.apache.org/jira/browse/SPARK-17157 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Miao Wang > > [SPARK-7159][ML] Add multiclass logistic regression to Spark ML has been > merged to Master. I open this JIRA for discussion of adding SparkR wrapper > for multiclass logistic regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17133) Improvements to linear methods in Spark
[ https://issues.apache.org/jira/browse/SPARK-17133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427144#comment-15427144 ] Xin Ren commented on SPARK-17133: - hi [~sethah] I'd like to help on this, please count me in. Thanks a lot :) > Improvements to linear methods in Spark > --- > > Key: SPARK-17133 > URL: https://issues.apache.org/jira/browse/SPARK-17133 > Project: Spark > Issue Type: Umbrella > Components: ML, MLlib >Reporter: Seth Hendrickson > > This JIRA is for tracking several improvements that we should make to > Linear/Logistic regression in Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17038) StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch
[ https://issues.apache.org/jira/browse/SPARK-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424017#comment-15424017 ] Xin Ren commented on SPARK-17038: - Hi [~ozzieba] I guess you too busy to respond or submit a PR, and I'm now just submitting a PR, really sorry not waiting for a longer time > StreamingSource reports metrics for lastCompletedBatch instead of > lastReceivedBatch > --- > > Key: SPARK-17038 > URL: https://issues.apache.org/jira/browse/SPARK-17038 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.2, 2.0.0 >Reporter: Oz Ben-Ami >Priority: Minor > Labels: metrics > > StreamingSource's lastReceivedBatch_submissionTime, > lastReceivedBatch_processingTimeStart, and > lastReceivedBatch_processingTimeEnd all use data from lastCompletedBatch > instead of lastReceivedBatch. In particular, this makes it impossible to > match lastReceivedBatch_records with a batchID/submission time. > This is apparent when looking at StreamingSource.scala, lines 89-94. > I would guess that just replacing Completed->Received in those lines would > fix the issue, but I haven't tested it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-17038) StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch
[ https://issues.apache.org/jira/browse/SPARK-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-17038: Comment: was deleted (was: Hi [~ozzieba] I guess you too busy to respond or submit a PR, and I'm now just submitting a PR, really sorry not waiting for a longer time) > StreamingSource reports metrics for lastCompletedBatch instead of > lastReceivedBatch > --- > > Key: SPARK-17038 > URL: https://issues.apache.org/jira/browse/SPARK-17038 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.2, 2.0.0 >Reporter: Oz Ben-Ami >Priority: Minor > Labels: metrics > > StreamingSource's lastReceivedBatch_submissionTime, > lastReceivedBatch_processingTimeStart, and > lastReceivedBatch_processingTimeEnd all use data from lastCompletedBatch > instead of lastReceivedBatch. In particular, this makes it impossible to > match lastReceivedBatch_records with a batchID/submission time. > This is apparent when looking at StreamingSource.scala, lines 89-94. > I would guess that just replacing Completed->Received in those lines would > fix the issue, but I haven't tested it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17038) StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch
[ https://issues.apache.org/jira/browse/SPARK-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424015#comment-15424015 ] Xin Ren commented on SPARK-17038: - Hi [~ozzieba] I guess you too busy to respond or submit a PR, and I'm now just submitting a PR, really sorry not waiting for a longer time > StreamingSource reports metrics for lastCompletedBatch instead of > lastReceivedBatch > --- > > Key: SPARK-17038 > URL: https://issues.apache.org/jira/browse/SPARK-17038 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.2, 2.0.0 >Reporter: Oz Ben-Ami >Priority: Minor > Labels: metrics > > StreamingSource's lastReceivedBatch_submissionTime, > lastReceivedBatch_processingTimeStart, and > lastReceivedBatch_processingTimeEnd all use data from lastCompletedBatch > instead of lastReceivedBatch. In particular, this makes it impossible to > match lastReceivedBatch_records with a batchID/submission time. > This is apparent when looking at StreamingSource.scala, lines 89-94. > I would guess that just replacing Completed->Received in those lines would > fix the issue, but I haven't tested it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17038) StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch
[ https://issues.apache.org/jira/browse/SPARK-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421789#comment-15421789 ] Xin Ren commented on SPARK-17038: - hi [~ozzieba] if you don't have time, I can just submit a quick path on this :) > StreamingSource reports metrics for lastCompletedBatch instead of > lastReceivedBatch > --- > > Key: SPARK-17038 > URL: https://issues.apache.org/jira/browse/SPARK-17038 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.2, 2.0.0 >Reporter: Oz Ben-Ami >Priority: Minor > Labels: metrics > > StreamingSource's lastReceivedBatch_submissionTime, > lastReceivedBatch_processingTimeStart, and > lastReceivedBatch_processingTimeEnd all use data from lastCompletedBatch > instead of lastReceivedBatch. In particular, this makes it impossible to > match lastReceivedBatch_records with a batchID/submission time. > This is apparent when looking at StreamingSource.scala, lines 89-94. > I would guess that just replacing Completed->Received in those lines would > fix the issue, but I haven't tested it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17026) warning msg in MulticlassMetricsSuite
[ https://issues.apache.org/jira/browse/SPARK-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren resolved SPARK-17026. - Resolution: Not A Problem > warning msg in MulticlassMetricsSuite > - > > Key: SPARK-17026 > URL: https://issues.apache.org/jira/browse/SPARK-17026 > Project: Spark > Issue Type: Improvement >Reporter: Xin Ren >Priority: Trivial > > Got warning when building: > {code} > [warn] > /home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:74: > value precision in class MulticlassMetrics is deprecated: Use accuracy. > [warn]assert(math.abs(metrics.accuracy - metrics.precision) < delta) > [warn]^ > [warn] > /home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:75: > value recall in class MulticlassMetrics is deprecated: Use accuracy. > [warn]assert(math.abs(metrics.accuracy - metrics.recall) < delta) > [warn]^ > [warn] > /home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:76: > value fMeasure in class MulticlassMetrics is deprecated: Use accuracy. > [warn]assert(math.abs(metrics.accuracy - metrics.fMeasure) < delta) > [warn]^ > {code} > And `precision` and `recall` and `fMeasure` are all referencing to `accuracy`: > {code} > assert(math.abs(metrics.accuracy - metrics.precision) < delta) > assert(math.abs(metrics.accuracy - metrics.recall) < delta) > assert(math.abs(metrics.accuracy - metrics.fMeasure) < delta) > {code} > {code} > /** >* Returns precision >*/ > @Since("1.1.0") > @deprecated("Use accuracy.", "2.0.0") > lazy val precision: Double = accuracy > /** >* Returns recall >* (equals to precision for multiclass classifier >* because sum of all false positives is equal to sum >* of all false negatives) >*/ > @Since("1.1.0") > @deprecated("Use accuracy.", "2.0.0") > lazy val recall: Double = accuracy > /** >* Returns f-measure >* (equals to precision and recall because precision equals recall) >*/ > @Since("1.1.0") > @deprecated("Use accuracy.", "2.0.0") > lazy val fMeasure: Double = accuracy > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17026) warning msg in MulticlassMetricsSuite
Xin Ren created SPARK-17026: --- Summary: warning msg in MulticlassMetricsSuite Key: SPARK-17026 URL: https://issues.apache.org/jira/browse/SPARK-17026 Project: Spark Issue Type: Improvement Reporter: Xin Ren Priority: Trivial Got warning when building: {code} [warn] /home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:74: value precision in class MulticlassMetrics is deprecated: Use accuracy. [warn]assert(math.abs(metrics.accuracy - metrics.precision) < delta) [warn]^ [warn] /home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:75: value recall in class MulticlassMetrics is deprecated: Use accuracy. [warn]assert(math.abs(metrics.accuracy - metrics.recall) < delta) [warn]^ [warn] /home/jenkins/workspace/SparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala:76: value fMeasure in class MulticlassMetrics is deprecated: Use accuracy. [warn]assert(math.abs(metrics.accuracy - metrics.fMeasure) < delta) [warn]^ {code} And `precision` and `recall` and `fMeasure` are all referencing to `accuracy`: {code} assert(math.abs(metrics.accuracy - metrics.precision) < delta) assert(math.abs(metrics.accuracy - metrics.recall) < delta) assert(math.abs(metrics.accuracy - metrics.fMeasure) < delta) {code} {code} /** * Returns precision */ @Since("1.1.0") @deprecated("Use accuracy.", "2.0.0") lazy val precision: Double = accuracy /** * Returns recall * (equals to precision for multiclass classifier * because sum of all false positives is equal to sum * of all false negatives) */ @Since("1.1.0") @deprecated("Use accuracy.", "2.0.0") lazy val recall: Double = accuracy /** * Returns f-measure * (equals to precision and recall because precision equals recall) */ @Since("1.1.0") @deprecated("Use accuracy.", "2.0.0") lazy val fMeasure: Double = accuracy {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17005) Fix warning "method tpe in trait AnnotationApi is deprecated"
Xin Ren created SPARK-17005: --- Summary: Fix warning "method tpe in trait AnnotationApi is deprecated" Key: SPARK-17005 URL: https://issues.apache.org/jira/browse/SPARK-17005 Project: Spark Issue Type: Improvement Components: Examples Affects Versions: 2.0.0 Reporter: Xin Ren Priority: Trivial When running module 'examples', there is warning: {code} [warn] /Users/quickmobile/workspace/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:349: method tpe in trait AnnotationApi is deprecated: Use `tree.tpe` instead [warn] case t if t.typeSymbol.annotations.exists(_.tpe =:= typeOf[SQLUserDefinedType]) => [warn] ^ [warn] /Users/quickmobile/workspace/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:551: method tpe in trait AnnotationApi is deprecated: Use `tree.tpe` instead [warn] case t if t.typeSymbol.annotations.exists(_.tpe =:= typeOf[SQLUserDefinedType]) => [warn] ^ [warn] /Users/quickmobile/workspace/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:647: method tpe in trait AnnotationApi is deprecated: Use `tree.tpe` instead [warn] case t if t.typeSymbol.annotations.exists(_.tpe =:= typeOf[SQLUserDefinedType]) => [warn] ^ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17004) Fix warning "method declarations in class TypeApi is deprecated"
Xin Ren created SPARK-17004: --- Summary: Fix warning "method declarations in class TypeApi is deprecated" Key: SPARK-17004 URL: https://issues.apache.org/jira/browse/SPARK-17004 Project: Spark Issue Type: Improvement Components: Examples Affects Versions: 2.0.0 Reporter: Xin Ren Priority: Trivial When running module 'examples', there is warning: {code} [warn] /home/jenkins/workspace/SparkPullRequestBuilder/examples/src/main/scala/org/apache/spark/examples/mllib/AbstractParams.scala:41: method declarations in class TypeApi is deprecated: Use `decls` instead [warn] val allAccessors = tpe.declarations.collect { [warn]^ [warn] one warning found [warn] /home/jenkins/workspace/SparkPullRequestBuilder/examples/src/main/scala/org/apache/spark/examples/mllib/AbstractParams.scala:41: method declarations in class TypeApi is deprecated: Use `decls` instead [warn] val allAccessors = tpe.declarations.collect { [warn] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16445) Multilayer Perceptron Classifier wrapper in SparkR
[ https://issues.apache.org/jira/browse/SPARK-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394654#comment-15394654 ] Xin Ren edited comment on SPARK-16445 at 7/26/16 10:17 PM: --- I'm still working on it, hopefully by end of this weekend I can submit PR :) I just have a quick question that which parameters should be passed from R command? For fit() of wrapper class, there are many parameters https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-ccb8590441998a896d1b74ca605b56efR62 {code} def fit( formula: String, data: DataFrame, blockSize: Int, layers: Array[Int], initialWeights: Vector, solver: String, seed: Long, maxIter: Int, tol: Double, stepSize: Double ): MultilayerPerceptronClassifierWrapper = { {code} And for R part, should I pass all the parameters from R command? https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-7ede1519b4a56647801b51af33c2dd18R461 I find in the example (http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier), only below parameters are being set, the rest are just usign default values {code} val trainer = new MultilayerPerceptronClassifier() .setLayers(layers) .setBlockSize(128) .setSeed(1234L) .setMaxIter(100) {code} was (Author: iamshrek): I'm still working on it, hopefully by end of this weekend I can submit PR :) I just have a quick question that which parameters should be passed from R command? For fit() of wrapper class, there are many parameters https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-ccb8590441998a896d1b74ca605b56efR62 {code} def fit( formula: String, data: DataFrame, blockSize: Int, layers: Array[Int], initialWeights: Vector, solver: String, seed: Long, maxIter: Int, tol: Double, stepSize: Double ): MultilayerPerceptronClassifierWrapper = { {code} And for R part, should I pass all the parameters from R command? https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-7ede1519b4a56647801b51af33c2dd18R461 I find in the example (http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier), only below parameters are being set, the rest are just usign default values {code} val trainer = new MultilayerPerceptronClassifier() .setLayers(layers) .setBlockSize(128) .setSeed(1234L) .setMaxIter(100) {code} > Multilayer Perceptron Classifier wrapper in SparkR > -- > > Key: SPARK-16445 > URL: https://issues.apache.org/jira/browse/SPARK-16445 > Project: Spark > Issue Type: Sub-task > Components: MLlib, SparkR >Reporter: Xiangrui Meng >Assignee: Xin Ren > > Follow instructions in SPARK-16442 and implement multilayer perceptron > classifier wrapper in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16445) Multilayer Perceptron Classifier wrapper in SparkR
[ https://issues.apache.org/jira/browse/SPARK-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394654#comment-15394654 ] Xin Ren commented on SPARK-16445: - I'm still working on it, hopefully by end of this weekend I can submit PR :) I just have a quick question that which parameters should be passed from R command? For fit() of wrapper class, there are many parameters https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-ccb8590441998a896d1b74ca605b56efR62 {code} def fit( formula: String, data: DataFrame, blockSize: Int, layers: Array[Int], initialWeights: Vector, solver: String, seed: Long, maxIter: Int, tol: Double, stepSize: Double ): MultilayerPerceptronClassifierWrapper = { {code} And for R part, should I pass all the parameters from R command? https://github.com/apache/spark/compare/master...keypointt:SPARK-16445?expand=1#diff-7ede1519b4a56647801b51af33c2dd18R461 I find in the example (http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier), only below parameters are being set, the rest are just usign default values {code} val trainer = new MultilayerPerceptronClassifier() .setLayers(layers) .setBlockSize(128) .setSeed(1234L) .setMaxIter(100) {code} > Multilayer Perceptron Classifier wrapper in SparkR > -- > > Key: SPARK-16445 > URL: https://issues.apache.org/jira/browse/SPARK-16445 > Project: Spark > Issue Type: Sub-task > Components: MLlib, SparkR >Reporter: Xiangrui Meng >Assignee: Xin Ren > > Follow instructions in SPARK-16442 and implement multilayer perceptron > classifier wrapper in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16580) [WARN] class Accumulator in package spark is deprecated: use AccumulatorV2
[ https://issues.apache.org/jira/browse/SPARK-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380169#comment-15380169 ] Xin Ren commented on SPARK-16580: - You are right this one is hard, it's kindof all over the place. I'd like to have a try, but I'm not sure I can resolve it... I tried the modify here: https://github.com/keypointt/spark/commit/84db7265250eef147c8d51e539ace9f9dfc35a19 And I compiled again and warnings on "PythonRDD.scala:78" and "PythonRDD.scala:71" disappeared. But for "PythonRDD.scala:873: trait AccumulatorParam" and "AccumulatorV2.scala:459: trait AccumulableParam", I don't know what to do with them, since they are supporting some legacy api call > [WARN] class Accumulator in package spark is deprecated: use AccumulatorV2 > -- > > Key: SPARK-16580 > URL: https://issues.apache.org/jira/browse/SPARK-16580 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Xin Ren >Priority: Minor > > When I was working on the R wrapper, I found the compile warn. > {code} > > project mllib > > console > [warn] > /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78: > class Accumulator in package spark is deprecated: use AccumulatorV2 > [warn] accumulator: Accumulator[JList[Array[Byte]]]) > [warn] > [warn] > /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78: > class Accumulator in package spark is deprecated: use AccumulatorV2 > [warn] accumulator: Accumulator[JList[Array[Byte]]]) > [warn] > [warn] > /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78: > class Accumulator in package spark is deprecated: use AccumulatorV2 > [warn] accumulator: Accumulator[JList[Array[Byte]]]) > [warn] > [warn] > /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:71: > class Accumulator in package spark is deprecated: use AccumulatorV2 > [warn] private[spark] case class PythonFunction( > [warn] > [warn] > /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78: > class Accumulator in package spark is deprecated: use AccumulatorV2 > [warn] accumulator: Accumulator[JList[Array[Byte]]]) > [warn] > [warn] > /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:873: > trait AccumulatorParam in package spark is deprecated: use AccumulatorV2 > [warn] extends AccumulatorParam[JList[Array[Byte]]] { > [warn] > [warn] > /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala:459: > trait AccumulableParam in package spark is deprecated: use AccumulatorV2 > [warn] param: org.apache.spark.AccumulableParam[R, T]) extends > AccumulatorV2[T, R] { > [warn] > [warn] > /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala:459: > trait AccumulableParam in package spark is deprecated: use AccumulatorV2 > [warn] param: org.apache.spark.AccumulableParam[R, T]) extends > AccumulatorV2[T, R] { > [warn] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16580) [WARN] class Accumulator in package spark is deprecated: use AccumulatorV2
Xin Ren created SPARK-16580: --- Summary: [WARN] class Accumulator in package spark is deprecated: use AccumulatorV2 Key: SPARK-16580 URL: https://issues.apache.org/jira/browse/SPARK-16580 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.0.0 Reporter: Xin Ren Priority: Minor When I was working on the R wrapper, I found the compile warn. {code} > project mllib > console [warn] /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78: class Accumulator in package spark is deprecated: use AccumulatorV2 [warn] accumulator: Accumulator[JList[Array[Byte]]]) [warn] [warn] /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78: class Accumulator in package spark is deprecated: use AccumulatorV2 [warn] accumulator: Accumulator[JList[Array[Byte]]]) [warn] [warn] /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78: class Accumulator in package spark is deprecated: use AccumulatorV2 [warn] accumulator: Accumulator[JList[Array[Byte]]]) [warn] [warn] /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:71: class Accumulator in package spark is deprecated: use AccumulatorV2 [warn] private[spark] case class PythonFunction( [warn] [warn] /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:78: class Accumulator in package spark is deprecated: use AccumulatorV2 [warn] accumulator: Accumulator[JList[Array[Byte]]]) [warn] [warn] /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:873: trait AccumulatorParam in package spark is deprecated: use AccumulatorV2 [warn] extends AccumulatorParam[JList[Array[Byte]]] { [warn] [warn] /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala:459: trait AccumulableParam in package spark is deprecated: use AccumulatorV2 [warn] param: org.apache.spark.AccumulableParam[R, T]) extends AccumulatorV2[T, R] { [warn] [warn] /Users/renxin/workspace/spark/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala:459: trait AccumulableParam in package spark is deprecated: use AccumulatorV2 [warn] param: org.apache.spark.AccumulableParam[R, T]) extends AccumulatorV2[T, R] { [warn] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16535) pom.xml warning: "Definition of groupId is redundant, because it's inherited from the parent"
[ https://issues.apache.org/jira/browse/SPARK-16535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-16535: Attachment: Screen Shot 2016-07-13 at 3.13.11 PM.png > pom.xml warning: "Definition of groupId is redundant, because it's inherited > from the parent" > - > > Key: SPARK-16535 > URL: https://issues.apache.org/jira/browse/SPARK-16535 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Xin Ren >Priority: Minor > Attachments: Screen Shot 2016-07-13 at 3.13.11 PM.png > > > When I scan through the pom.xml of sub projects, I found this warning as > below and attached screenshot > {code} > Definition of groupId is redundant, because it's inherited from the parent > {code} > I've tried to remove some of the lines with groupId definition, and the build > on my local machine is still ok. > {code} > org.apache.spark > {code} > As I just find now 3.3.9 is being used in > Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will > remove the need to specify the parent version in sub modules. THIS is great > (in Maven 3.1). > ref: > http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16535) pom.xml warning: "Definition of groupId is redundant, because it's inherited from the parent"
Xin Ren created SPARK-16535: --- Summary: pom.xml warning: "Definition of groupId is redundant, because it's inherited from the parent" Key: SPARK-16535 URL: https://issues.apache.org/jira/browse/SPARK-16535 Project: Spark Issue Type: Improvement Components: Build Reporter: Xin Ren Priority: Minor Attachments: Screen Shot 2016-07-13 at 3.13.11 PM.png When I scan through the pom.xml of sub projects, I found this warning as below and attached screenshot {code} Definition of groupId is redundant, because it's inherited from the parent {code} I've tried to remove some of the lines with groupId definition, and the build on my local machine is still ok. {code} org.apache.spark {code} As I just find now 3.3.9 is being used in Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will remove the need to specify the parent version in sub modules. THIS is great (in Maven 3.1). ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren closed SPARK-16437. --- > SparkR read.df() from parquet got error: SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > > > Key: SPARK-16437 > URL: https://issues.apache.org/jira/browse/SPARK-16437 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xin Ren >Priority: Minor > > build SparkR with command > {code} > build/mvn -DskipTests -Psparkr package > {code} > start SparkR console > {code} > ./bin/sparkR > {code} > then get error > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > > > > > library(SparkR) > > > > df <- read.df("examples/src/main/resources/users.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > > > > head(df) > 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to > context is not a instance of TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > name favorite_color favorite_numbers > 1 Alyssa3, 9, 15, 20 > 2Benred NULL > {code} > Reference > * seems need to add a lib from slf4j to point to older version > http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder > * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren resolved SPARK-16437. - Resolution: Not A Problem > SparkR read.df() from parquet got error: SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > > > Key: SPARK-16437 > URL: https://issues.apache.org/jira/browse/SPARK-16437 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xin Ren >Priority: Minor > > build SparkR with command > {code} > build/mvn -DskipTests -Psparkr package > {code} > start SparkR console > {code} > ./bin/sparkR > {code} > then get error > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > > > > > library(SparkR) > > > > df <- read.df("examples/src/main/resources/users.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > > > > head(df) > 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to > context is not a instance of TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > name favorite_color favorite_numbers > 1 Alyssa3, 9, 15, 20 > 2Benred NULL > {code} > Reference > * seems need to add a lib from slf4j to point to older version > http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder > * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-16502) Update depreciated method "ParquetFileReader" from parquet
[ https://issues.apache.org/jira/browse/SPARK-16502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren closed SPARK-16502. --- Resolution: Invalid > Update depreciated method "ParquetFileReader" from parquet > -- > > Key: SPARK-16502 > URL: https://issues.apache.org/jira/browse/SPARK-16502 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Xin Ren > > During code compile, got below depreciation message. Need to update the > method invocation. > {code} > /Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java > Warning:(140, 19) java: > ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > Warning:(204, 19) java: > ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16502) Update depreciated method "ParquetFileReader" from parquet
[ https://issues.apache.org/jira/browse/SPARK-16502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-16502: Description: During code compile, got below depreciation message. Need to update the method invocation. {code} /Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java Warning:(140, 19) java: ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated Warning:(204, 19) java: ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated {code} was: During code compile, got below depreciation message. Need to update the method invocation. {code} /Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java Warning:(140, 19) java: ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated Warning:(204, 19) java: ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated {code} > Update depreciated method "ParquetFileReader" from parquet > -- > > Key: SPARK-16502 > URL: https://issues.apache.org/jira/browse/SPARK-16502 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Xin Ren > > During code compile, got below depreciation message. Need to update the > method invocation. > {code} > /Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java > Warning:(140, 19) java: > ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > Warning:(204, 19) java: > ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16502) Update depreciated method "ParquetFileReader" from parquet
[ https://issues.apache.org/jira/browse/SPARK-16502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-16502: Description: During code compile, got below depreciation message. Need to update the method invocation. {code} /Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java Warning:(140, 19) java: ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated Warning:(204, 19) java: ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated {code} was: During code compile, got below depreciation message. Need to update the method invocation. {code} /Users/quickmobile/workspace/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala Warning:(448, 28) method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information. ConversionPatterns.listType( ^ Warning:(464, 28) method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information. ConversionPatterns.listType( ^ {code} > Update depreciated method "ParquetFileReader" from parquet > -- > > Key: SPARK-16502 > URL: https://issues.apache.org/jira/browse/SPARK-16502 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Xin Ren > > During code compile, got below depreciation message. Need to update the > method invocation. > {code} > /Users/renxin/workspace/spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java > Warning:(140, 19) java: > ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > Warning:(204, 19) java: > ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.util.List,java.util.List) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16502) Update depreciated method "ParquetFileReader" from parquet
[ https://issues.apache.org/jira/browse/SPARK-16502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373494#comment-15373494 ] Xin Ren commented on SPARK-16502: - I'm working on it. > Update depreciated method "ParquetFileReader" from parquet > -- > > Key: SPARK-16502 > URL: https://issues.apache.org/jira/browse/SPARK-16502 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Xin Ren > > During code compile, got below depreciation message. Need to update the > method invocation. > {code} > /Users/quickmobile/workspace/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala > Warning:(448, 28) method listType in object ConversionPatterns is deprecated: > see corresponding Javadoc for more information. > ConversionPatterns.listType( >^ > Warning:(464, 28) method listType in object ConversionPatterns is deprecated: > see corresponding Javadoc for more information. > ConversionPatterns.listType( >^ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16502) Update depreciated method "ParquetFileReader" from parquet
Xin Ren created SPARK-16502: --- Summary: Update depreciated method "ParquetFileReader" from parquet Key: SPARK-16502 URL: https://issues.apache.org/jira/browse/SPARK-16502 Project: Spark Issue Type: Improvement Components: SQL Reporter: Xin Ren During code compile, got below depreciation message. Need to update the method invocation. {code} /Users/quickmobile/workspace/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala Warning:(448, 28) method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information. ConversionPatterns.listType( ^ Warning:(464, 28) method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information. ConversionPatterns.listType( ^ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372210#comment-15372210 ] Xin Ren edited comment on SPARK-16437 at 7/12/16 6:04 PM: -- I worked on this for couple days, and I found it's not caused by Spark, but the parquet library "parquet-mr/parquet-hadoop". I've debug by step, and found this error is from here: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L820 and after digging into "parquet-hadoop", it's mostly probably because this library is missing the slf4j binder: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L231 But it's technically not a bug, since Spark is using latest version of slf4j and parquet {code} 1.7.16 1.8.1 {code} and since 1.6 SLF4J is defaulting to no-operation (NOP) logger implementation, so should be ok. was (Author: iamshrek): I worked on this for couple days, and I found it's not caused by Spark, but the parquet library "parquet-mr/parquet-hadoop". I've debug by step, and found this error is from here: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L820 and after digging into "parquet-hadoop", it's mostly probably because this library is missing the slf4j binder: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L231 But it's technically not a bug, since Spark is using {code}1.7.16{code}, and since 1.6 SLF4J is defaulting to no-operation (NOP) logger implementation, so should be ok. > SparkR read.df() from parquet got error: SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > > > Key: SPARK-16437 > URL: https://issues.apache.org/jira/browse/SPARK-16437 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xin Ren >Priority: Minor > > build SparkR with command > {code} > build/mvn -DskipTests -Psparkr package > {code} > start SparkR console > {code} > ./bin/sparkR > {code} > then get error > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > > > > > library(SparkR) > > > > df <- read.df("examples/src/main/resources/users.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > > > > head(df) > 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to > context is not a instance of TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > name favorite_color favorite_numbers > 1 Alyssa3, 9, 15, 20 > 2Benred NULL > {code} > Reference > * seems need to add a lib from slf4j to point to older version > http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder > * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373338#comment-15373338 ] Xin Ren commented on SPARK-16437: - hi [~srowen], could you please have a look here? I think the SLF4J error of this ticket is from parquet library "parquet-mr/parquet-hadoop", not Spark's problem. But I still have very tiny changes on style, should I submit the PR or just ignore it? since just 2 lines change.. https://github.com/apache/spark/compare/master...keypointt:SPARK-16437?expand=1 thank you very much :) > SparkR read.df() from parquet got error: SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > > > Key: SPARK-16437 > URL: https://issues.apache.org/jira/browse/SPARK-16437 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xin Ren >Priority: Minor > > build SparkR with command > {code} > build/mvn -DskipTests -Psparkr package > {code} > start SparkR console > {code} > ./bin/sparkR > {code} > then get error > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > > > > > library(SparkR) > > > > df <- read.df("examples/src/main/resources/users.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > > > > head(df) > 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to > context is not a instance of TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > name favorite_color favorite_numbers > 1 Alyssa3, 9, 15, 20 > 2Benred NULL > {code} > Reference > * seems need to add a lib from slf4j to point to older version > http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder > * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372211#comment-15372211 ] Xin Ren commented on SPARK-16437: - But I still find some minor improvements during my debugging, and will submit a PR tomorrow. > SparkR read.df() from parquet got error: SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > > > Key: SPARK-16437 > URL: https://issues.apache.org/jira/browse/SPARK-16437 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xin Ren >Priority: Minor > > build SparkR with command > {code} > build/mvn -DskipTests -Psparkr package > {code} > start SparkR console > {code} > ./bin/sparkR > {code} > then get error > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > > > > > library(SparkR) > > > > df <- read.df("examples/src/main/resources/users.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > > > > head(df) > 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to > context is not a instance of TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > name favorite_color favorite_numbers > 1 Alyssa3, 9, 15, 20 > 2Benred NULL > {code} > Reference > * seems need to add a lib from slf4j to point to older version > http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder > * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372210#comment-15372210 ] Xin Ren commented on SPARK-16437: - I worked on this for couple days, and I found it's not caused by Spark, but the parquet library "parquet-mr/parquet-hadoop". I've debug by step, and found this error is from here: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L820 and after digging into "parquet-hadoop", it's mostly probably because this library is missing the slf4j binder: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L231 But it's technically not a bug, since Spark is using {code}1.7.16{code}, and since 1.6 SLF4J is defaulting to no-operation (NOP) logger implementation, so should be ok. > SparkR read.df() from parquet got error: SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > > > Key: SPARK-16437 > URL: https://issues.apache.org/jira/browse/SPARK-16437 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xin Ren >Priority: Minor > > build SparkR with command > {code} > build/mvn -DskipTests -Psparkr package > {code} > start SparkR console > {code} > ./bin/sparkR > {code} > then get error > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > > > > > library(SparkR) > > > > df <- read.df("examples/src/main/resources/users.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > > > > head(df) > 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to > context is not a instance of TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > name favorite_color favorite_numbers > 1 Alyssa3, 9, 15, 20 > 2Benred NULL > {code} > Reference > * seems need to add a lib from slf4j to point to older version > http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder > * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16445) Multilayer Perceptron Classifier wrapper in SparkR
[ https://issues.apache.org/jira/browse/SPARK-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371385#comment-15371385 ] Xin Ren commented on SPARK-16445: - great to know, I'll start on it, thanks Xiangrui > Multilayer Perceptron Classifier wrapper in SparkR > -- > > Key: SPARK-16445 > URL: https://issues.apache.org/jira/browse/SPARK-16445 > Project: Spark > Issue Type: Sub-task > Components: MLlib, SparkR >Reporter: Xiangrui Meng >Assignee: Xin Ren > > Follow instructions in SPARK-16442 and implement multilayer perceptron > classifier wrapper in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368548#comment-15368548 ] Xin Ren commented on SPARK-16437: - It's SQL's problem I think, I'll remove the SparkR tag > SparkR read.df() from parquet got error: SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > > > Key: SPARK-16437 > URL: https://issues.apache.org/jira/browse/SPARK-16437 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xin Ren >Priority: Minor > > build SparkR with command > {code} > build/mvn -DskipTests -Psparkr package > {code} > start SparkR console > {code} > ./bin/sparkR > {code} > then get error > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > > > > > library(SparkR) > > > > df <- read.df("examples/src/main/resources/users.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > > > > head(df) > 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to > context is not a instance of TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > name favorite_color favorite_numbers > 1 Alyssa3, 9, 15, 20 > 2Benred NULL > {code} > Reference > * seems need to add a lib from slf4j to point to older version > http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder > * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-16437: Component/s: (was: SparkR) > SparkR read.df() from parquet got error: SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > > > Key: SPARK-16437 > URL: https://issues.apache.org/jira/browse/SPARK-16437 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xin Ren >Priority: Minor > > build SparkR with command > {code} > build/mvn -DskipTests -Psparkr package > {code} > start SparkR console > {code} > ./bin/sparkR > {code} > then get error > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > > > > > library(SparkR) > > > > df <- read.df("examples/src/main/resources/users.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > > > > head(df) > 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to > context is not a instance of TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > name favorite_color favorite_numbers > 1 Alyssa3, 9, 15, 20 > 2Benred NULL > {code} > Reference > * seems need to add a lib from slf4j to point to older version > http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder > * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-16437: Description: build SparkR with command {code} build/mvn -DskipTests -Psparkr package {code} start SparkR console {code} ./bin/sparkR {code} then get error {code} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ SparkSession available as 'spark'. > > > library(SparkR) > > df <- read.df("examples/src/main/resources/users.parquet") SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. > > > head(df) 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl name favorite_color favorite_numbers 1 Alyssa3, 9, 15, 20 2Benred NULL {code} Reference * seems need to add a lib from slf4j to point to older version http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder was: build SparkR with command {code} build/mvn -DskipTests -Psparkr package {code} start SparkR console {code} ./bin/sparkR {code} then get error {code} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ SparkSession available as 'spark'. > > > library(SparkR) > > df <- read.df("examples/src/main/resources/users.parquet") SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. > > > head(df) 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl name favorite_color favorite_numbers 1 Alyssa3, 9, 15, 20 2Benred NULL {code} seems need to add a lib from slf4j http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder > SparkR read.df() from parquet got error: SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > > > Key: SPARK-16437 > URL: https://issues.apache.org/jira/browse/SPARK-16437 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL >Reporter: Xin Ren >Priority: Minor > > build SparkR with command > {code} > build/mvn -DskipTests -Psparkr package > {code} > start SparkR console > {code} > ./bin/sparkR > {code} > then get error > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > > > > > library(SparkR) > > > > df <- read.df("examples/src/main/resources/users.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > > > > head(df) > 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to > context is not a instance of TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > name favorite_color favorite_numbers > 1 Alyssa3, 9, 15, 20 > 2Benred NULL > {code} > Reference > * seems need to add a lib from slf4j to point to older version > http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder > * on slf4j official site: http://www.slf4j.org/codes.html#StaticLoggerBinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-16437: Description: build SparkR with command {code} build/mvn -DskipTests -Psparkr package {code} start SparkR console {code} ./bin/sparkR {code} then get error {code} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ SparkSession available as 'spark'. > > > library(SparkR) > > df <- read.df("examples/src/main/resources/users.parquet") SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. > > > head(df) 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl name favorite_color favorite_numbers 1 Alyssa3, 9, 15, 20 2Benred NULL {code} seems need to add a lib from slf4j http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder was: start SparkR console {code} ./bin/sparkR {code} then get error {code} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ SparkSession available as 'spark'. > > > library(SparkR) > > df <- read.df("examples/src/main/resources/users.parquet") SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. > > > head(df) 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl name favorite_color favorite_numbers 1 Alyssa3, 9, 15, 20 2Benred NULL {code} seems need to add a lib from slf4j http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder > SparkR read.df() from parquet got error: SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > > > Key: SPARK-16437 > URL: https://issues.apache.org/jira/browse/SPARK-16437 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL >Reporter: Xin Ren >Priority: Minor > Fix For: 2.0.0 > > > build SparkR with command > {code} > build/mvn -DskipTests -Psparkr package > {code} > start SparkR console > {code} > ./bin/sparkR > {code} > then get error > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > > > > > library(SparkR) > > > > df <- read.df("examples/src/main/resources/users.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > > > > head(df) > 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to > context is not a instance of TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > name favorite_color favorite_numbers > 1 Alyssa3, 9, 15, 20 > 2Benred NULL > {code} > seems need to add a lib from slf4j > http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16445) Multilayer Perceptron Classifier wrapper in SparkR
[ https://issues.apache.org/jira/browse/SPARK-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368479#comment-15368479 ] Xin Ren commented on SPARK-16445: - Hi Xiangrui, may I have a try on this one? Is there a strict deadline to hit? Thanks a lot > Multilayer Perceptron Classifier wrapper in SparkR > -- > > Key: SPARK-16445 > URL: https://issues.apache.org/jira/browse/SPARK-16445 > Project: Spark > Issue Type: Sub-task > Components: MLlib, SparkR >Reporter: Xiangrui Meng > > Follow instructions in SPARK-16442 and implement multilayer perceptron > classifier wrapper in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
[ https://issues.apache.org/jira/browse/SPARK-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367310#comment-15367310 ] Xin Ren commented on SPARK-16437: - I'm working on it :) > SparkR read.df() from parquet got error: SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder" > > > Key: SPARK-16437 > URL: https://issues.apache.org/jira/browse/SPARK-16437 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL >Reporter: Xin Ren >Priority: Minor > Fix For: 2.0.0 > > > start SparkR console > {code} > ./bin/sparkR > {code} > then get error > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > > > > > library(SparkR) > > > > df <- read.df("examples/src/main/resources/users.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > > > > head(df) > 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to > context is not a instance of TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > name favorite_color favorite_numbers > 1 Alyssa3, 9, 15, 20 > 2Benred NULL > {code} > seems need to add a lib from slf4j > http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16437) SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
Xin Ren created SPARK-16437: --- Summary: SparkR read.df() from parquet got error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Key: SPARK-16437 URL: https://issues.apache.org/jira/browse/SPARK-16437 Project: Spark Issue Type: Bug Components: SparkR, SQL Reporter: Xin Ren Priority: Minor Fix For: 2.0.0 start SparkR console {code} ./bin/sparkR {code} then get error {code} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ SparkSession available as 'spark'. > > > library(SparkR) > > df <- read.df("examples/src/main/resources/users.parquet") SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. > > > head(df) 16/07/07 23:20:54 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl name favorite_color favorite_numbers 1 Alyssa3, 9, 15, 20 2Benred NULL {code} seems need to add a lib from slf4j http://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16381) Update SQL examples and programming guide for R language binding
[ https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366261#comment-15366261 ] Xin Ren commented on SPARK-16381: - Oh I see, thank you so much :) > Update SQL examples and programming guide for R language binding > > > Key: SPARK-16381 > URL: https://issues.apache.org/jira/browse/SPARK-16381 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Examples >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Xin Ren > > Please follow guidelines listed in this SPARK-16303 > [comment|https://issues.apache.org/jira/browse/SPARK-16303?focusedCommentId=15362575=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15362575]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16381) Update SQL examples and programming guide for R language binding
[ https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365597#comment-15365597 ] Xin Ren commented on SPARK-16381: - Hi Cheng, do you mind tell me where to find the RC date, or release schedule? I tried here https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel, but not much information found > Update SQL examples and programming guide for R language binding > > > Key: SPARK-16381 > URL: https://issues.apache.org/jira/browse/SPARK-16381 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Examples >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Xin Ren > > Please follow guidelines listed in this SPARK-16303 > [comment|https://issues.apache.org/jira/browse/SPARK-16303?focusedCommentId=15362575=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15362575]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16381) Update SQL examples and programming guide for R language binding
[ https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364664#comment-15364664 ] Xin Ren commented on SPARK-16381: - I can work on this :) Is there a strict deadline to finish? like need to be finished in couple days? > Update SQL examples and programming guide for R language binding > > > Key: SPARK-16381 > URL: https://issues.apache.org/jira/browse/SPARK-16381 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Examples >Affects Versions: 2.0.0 >Reporter: Cheng Lian > > Please follow guidelines listed in this SPARK-16303 > [comment|https://issues.apache.org/jira/browse/SPARK-16303?focusedCommentId=15362575=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15362575]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16233) test_sparkSQL.R is failing
[ https://issues.apache.org/jira/browse/SPARK-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360058#comment-15360058 ] Xin Ren commented on SPARK-16233: - Oh sorry I thought you guys would take over so I stopped working on this one. Thanks a lot resolving this (y) > test_sparkSQL.R is failing > -- > > Key: SPARK-16233 > URL: https://issues.apache.org/jira/browse/SPARK-16233 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests >Affects Versions: 2.0.0 >Reporter: Xin Ren >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.0.0 > > > By running > {code} > ./R/run-tests.sh > {code} > Getting error: > {code} > xin:spark xr$ ./R/run-tests.sh > Warning: Ignoring non-spark config property: SPARK_SCALA_VERSION=2.11 > Loading required package: methods > Attaching package: ‘SparkR’ > The following object is masked from ‘package:testthat’: > describe > The following objects are masked from ‘package:stats’: > cov, filter, lag, na.omit, predict, sd, var, window > The following objects are masked from ‘package:base’: > as.data.frame, colnames, colnames<-, drop, endsWith, intersect, > rank, rbind, sample, startsWith, subset, summary, transform, union > binary functions: ... > functions on binary files: > broadcast variables: .. > functions in client.R: . > test functions in sparkR.R: .Re-using existing Spark Context. Call > sparkR.session.stop() or restart R to create a new Spark Context > Re-using existing Spark Context. Call sparkR.session.stop() or restart R > to create a new Spark Context > ... > include an external JAR in SparkContext: Warning: Ignoring non-spark config > property: SPARK_SCALA_VERSION=2.11 > .. > include R packages: > MLlib functions: .SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > .27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: > Compression: SNAPPY > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet block size to 134217728 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet page size to 1048576 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet dictionary page size to 1048576 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Dictionary is on > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Validation is off > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Writer version is: PARQUET_1_0 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Maximum row group padding size is 0 bytes > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem > columnStore to file. allocated memory: 65,622 > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 70B for [label] > BINARY: 1 values, 21B raw, 23B comp, 1 pages, encodings: [PLAIN, RLE, > BIT_PACKED] > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 87B for [terms, > list, element, list, element] BINARY: 2 values, 42B raw, 43B comp, 1 pages, > encodings: [PLAIN, RLE] > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 30B for > [hasIntercept] BOOLEAN: 1 values, 1B raw, 3B comp, 1 pages, encodings: > [PLAIN, BIT_PACKED] > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: > Compression: SNAPPY > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet block size to 134217728 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet page size to 1048576 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet dictionary page size to 1048576 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Dictionary is on > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Validation is off > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Writer version is: PARQUET_1_0 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Maximum row group padding size is 0 bytes > 27-Jun-2016 1:51:26 PM INFO: > org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem > columnStore to file. allocated memory: 49 > 27-Jun-2016 1:51:26 PM INFO: > org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for [labels, > list, element] BINARY: 3
[jira] [Comment Edited] (SPARK-16144) Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict
[ https://issues.apache.org/jira/browse/SPARK-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355648#comment-15355648 ] Xin Ren edited comment on SPARK-16144 at 6/29/16 6:57 PM: -- Sure, thanks Xiangrui :) was (Author: iamshrek): Sure, thank Xiangrui :) > Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict > - > > Key: SPARK-16144 > URL: https://issues.apache.org/jira/browse/SPARK-16144 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib, SparkR >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Yanbo Liang > > After we grouped generic methods by the algorithm, it would be nice to add a > separate Rd for each ML generic methods, in particular, write.ml, read.ml, > summary, and predict and link the implementations with seealso. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16144) Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict
[ https://issues.apache.org/jira/browse/SPARK-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355648#comment-15355648 ] Xin Ren commented on SPARK-16144: - Sure, thank Xiangrui :) > Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict > - > > Key: SPARK-16144 > URL: https://issues.apache.org/jira/browse/SPARK-16144 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib, SparkR >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Yanbo Liang > > After we grouped generic methods by the algorithm, it would be nice to add a > separate Rd for each ML generic methods, in particular, write.ml, read.ml, > summary, and predict and link the implementations with seealso. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16233) test_sparkSQL.R is failing
[ https://issues.apache.org/jira/browse/SPARK-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353352#comment-15353352 ] Xin Ren commented on SPARK-16233: - Actually I was just following the docs here https://github.com/keypointt/spark/tree/master/R#examples-unit-tests Maybe we should update the docs here to point it out that "-Phive" could be needed? {code} build/mvn -DskipTests -Psparkr package {code} {code} You can also run the unit tests for SparkR by running. You need to install the testthat package first: R -e 'install.packages("testthat", repos="http://cran.us.r-project.org;)' ./R/run-tests.sh {code} > test_sparkSQL.R is failing > -- > > Key: SPARK-16233 > URL: https://issues.apache.org/jira/browse/SPARK-16233 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests >Affects Versions: 2.0.0 >Reporter: Xin Ren >Priority: Minor > > By running > {code} > ./R/run-tests.sh > {code} > Getting error: > {code} > xin:spark xr$ ./R/run-tests.sh > Warning: Ignoring non-spark config property: SPARK_SCALA_VERSION=2.11 > Loading required package: methods > Attaching package: ‘SparkR’ > The following object is masked from ‘package:testthat’: > describe > The following objects are masked from ‘package:stats’: > cov, filter, lag, na.omit, predict, sd, var, window > The following objects are masked from ‘package:base’: > as.data.frame, colnames, colnames<-, drop, endsWith, intersect, > rank, rbind, sample, startsWith, subset, summary, transform, union > binary functions: ... > functions on binary files: > broadcast variables: .. > functions in client.R: . > test functions in sparkR.R: .Re-using existing Spark Context. Call > sparkR.session.stop() or restart R to create a new Spark Context > Re-using existing Spark Context. Call sparkR.session.stop() or restart R > to create a new Spark Context > ... > include an external JAR in SparkContext: Warning: Ignoring non-spark config > property: SPARK_SCALA_VERSION=2.11 > .. > include R packages: > MLlib functions: .SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > .27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: > Compression: SNAPPY > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet block size to 134217728 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet page size to 1048576 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet dictionary page size to 1048576 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Dictionary is on > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Validation is off > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Writer version is: PARQUET_1_0 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Maximum row group padding size is 0 bytes > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem > columnStore to file. allocated memory: 65,622 > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 70B for [label] > BINARY: 1 values, 21B raw, 23B comp, 1 pages, encodings: [PLAIN, RLE, > BIT_PACKED] > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 87B for [terms, > list, element, list, element] BINARY: 2 values, 42B raw, 43B comp, 1 pages, > encodings: [PLAIN, RLE] > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 30B for > [hasIntercept] BOOLEAN: 1 values, 1B raw, 3B comp, 1 pages, encodings: > [PLAIN, BIT_PACKED] > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: > Compression: SNAPPY > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet block size to 134217728 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet page size to 1048576 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet dictionary page size to 1048576 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Dictionary is on > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Validation is off > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Writer version is: PARQUET_1_0 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Maximum row group padding size is 0 bytes >
[jira] [Commented] (SPARK-16233) test_sparkSQL.R is failing
[ https://issues.apache.org/jira/browse/SPARK-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353340#comment-15353340 ] Xin Ren commented on SPARK-16233: - this is what I used to build sparkR, should I add "-Phive"? sorry I'm new to this part. {code} build/mvn -DskipTests -Psparkr package {code} > test_sparkSQL.R is failing > -- > > Key: SPARK-16233 > URL: https://issues.apache.org/jira/browse/SPARK-16233 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests >Affects Versions: 2.0.0 >Reporter: Xin Ren >Priority: Minor > > By running > {code} > ./R/run-tests.sh > {code} > Getting error: > {code} > xin:spark xr$ ./R/run-tests.sh > Warning: Ignoring non-spark config property: SPARK_SCALA_VERSION=2.11 > Loading required package: methods > Attaching package: ‘SparkR’ > The following object is masked from ‘package:testthat’: > describe > The following objects are masked from ‘package:stats’: > cov, filter, lag, na.omit, predict, sd, var, window > The following objects are masked from ‘package:base’: > as.data.frame, colnames, colnames<-, drop, endsWith, intersect, > rank, rbind, sample, startsWith, subset, summary, transform, union > binary functions: ... > functions on binary files: > broadcast variables: .. > functions in client.R: . > test functions in sparkR.R: .Re-using existing Spark Context. Call > sparkR.session.stop() or restart R to create a new Spark Context > Re-using existing Spark Context. Call sparkR.session.stop() or restart R > to create a new Spark Context > ... > include an external JAR in SparkContext: Warning: Ignoring non-spark config > property: SPARK_SCALA_VERSION=2.11 > .. > include R packages: > MLlib functions: .SLF4J: Failed to load class > "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > .27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: > Compression: SNAPPY > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet block size to 134217728 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet page size to 1048576 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet dictionary page size to 1048576 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Dictionary is on > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Validation is off > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Writer version is: PARQUET_1_0 > 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Maximum row group padding size is 0 bytes > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem > columnStore to file. allocated memory: 65,622 > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 70B for [label] > BINARY: 1 values, 21B raw, 23B comp, 1 pages, encodings: [PLAIN, RLE, > BIT_PACKED] > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 87B for [terms, > list, element, list, element] BINARY: 2 values, 42B raw, 43B comp, 1 pages, > encodings: [PLAIN, RLE] > 27-Jun-2016 1:51:25 PM INFO: > org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 30B for > [hasIntercept] BOOLEAN: 1 values, 1B raw, 3B comp, 1 pages, encodings: > [PLAIN, BIT_PACKED] > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: > Compression: SNAPPY > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet block size to 134217728 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet page size to 1048576 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Parquet dictionary page size to 1048576 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Dictionary is on > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Validation is off > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Writer version is: PARQUET_1_0 > 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: > Maximum row group padding size is 0 bytes > 27-Jun-2016 1:51:26 PM INFO: > org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem > columnStore to file. allocated memory: 49 > 27-Jun-2016 1:51:26 PM INFO: > org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for [labels, > list, element] BINARY: 3 values, 50B raw, 50B comp, 1 pages,
[jira] [Commented] (SPARK-16144) Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict
[ https://issues.apache.org/jira/browse/SPARK-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352436#comment-15352436 ] Xin Ren commented on SPARK-16144: - sorry still trying to solve the merge conflicts should be close to finish... > Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict > - > > Key: SPARK-16144 > URL: https://issues.apache.org/jira/browse/SPARK-16144 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib, SparkR >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Xin Ren > > After we grouped generic methods by the algorithm, it would be nice to add a > separate Rd for each ML generic methods, in particular, write.ml, read.ml, > summary, and predict and link the implementations with seealso. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16233) test_sparkSQL.R is failing
[ https://issues.apache.org/jira/browse/SPARK-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-16233: Description: By running {code} ./R/run-tests.sh {code} Getting error: {code} xin:spark xr$ ./R/run-tests.sh Warning: Ignoring non-spark config property: SPARK_SCALA_VERSION=2.11 Loading required package: methods Attaching package: ‘SparkR’ The following object is masked from ‘package:testthat’: describe The following objects are masked from ‘package:stats’: cov, filter, lag, na.omit, predict, sd, var, window The following objects are masked from ‘package:base’: as.data.frame, colnames, colnames<-, drop, endsWith, intersect, rank, rbind, sample, startsWith, subset, summary, transform, union binary functions: ... functions on binary files: broadcast variables: .. functions in client.R: . test functions in sparkR.R: .Re-using existing Spark Context. Call sparkR.session.stop() or restart R to create a new Spark Context Re-using existing Spark Context. Call sparkR.session.stop() or restart R to create a new Spark Context ... include an external JAR in SparkContext: Warning: Ignoring non-spark config property: SPARK_SCALA_VERSION=2.11 .. include R packages: MLlib functions: .SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. .27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: Compression: SNAPPY 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Dictionary is on 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Validation is off 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Maximum row group padding size is 0 bytes 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 65,622 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 70B for [label] BINARY: 1 values, 21B raw, 23B comp, 1 pages, encodings: [PLAIN, RLE, BIT_PACKED] 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 87B for [terms, list, element, list, element] BINARY: 2 values, 42B raw, 43B comp, 1 pages, encodings: [PLAIN, RLE] 27-Jun-2016 1:51:25 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 30B for [hasIntercept] BOOLEAN: 1 values, 1B raw, 3B comp, 1 pages, encodings: [PLAIN, BIT_PACKED] 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: Compression: SNAPPY 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Dictionary is on 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Validation is off 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Maximum row group padding size is 0 bytes 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 49 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 90B for [labels, list, element] BINARY: 3 values, 50B raw, 50B comp, 1 pages, encodings: [PLAIN, RLE] 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: Compression: SNAPPY 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Dictionary is on 27-Jun-2016 1:51:26 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: Validation is off 27-Jun-2016 1:51:26 PM INFO:
[jira] [Commented] (SPARK-16233) test_sparkSQL.R is failing
[ https://issues.apache.org/jira/browse/SPARK-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351703#comment-15351703 ] Xin Ren commented on SPARK-16233: - I'm working on this > test_sparkSQL.R is failing > -- > > Key: SPARK-16233 > URL: https://issues.apache.org/jira/browse/SPARK-16233 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests >Affects Versions: 2.0.0 >Reporter: Xin Ren >Priority: Minor > > By running > {code} > ./R/run-tests.sh > {code} > Getting error: > {code} > 15. Error: create DataFrame from list or data.frame (@test_sparkSQL.R#277) > - > java.lang.NoClassDefFoundorg/apache/spark/sql/execution/datasources/PreInsertCastAndRename$ > at > org.apache.spark.sql.hive.HiveSessionState$$anon$1.(HiveSessionState.scala:69) > at > org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63) > at > org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at > org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:533) > at > org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:293) > at org.apache.spark.sql.api.r.SQLUtils$.createDF(SQLUtils.scala:135) > at org.apache.spark.sql.api.r.SQLUtils.createDF(SQLUtils.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141) > at > org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86) > at > org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > at java.lang.Thread.run(Thread.java:745) > 1: createDataFrame(l, c("a", "b")) at > /Users/quickmobile/workspace/spark/R/lib/SparkR/tests/testthat/test_sparkSQL.R:277 > 2: dispatchFunc("createDataFrame(data, schema = NULL, samplingRatio = 1.0)", > x, ...) > 3: f(x, ...) > 4: callJStatic("org.apache.spark.sql.api.r.SQLUtils", "createDF", srdd, > schema$jobj, >sparkSession) > 5: invokeJava(isStatic = TRUE, className, methodName, ...) > 6: stop(readString(conn)) > DONE > === > Execution halted > {code} > Cause: most probably these tests are using 'createDataFrame(sqlContext...)' > which is deprecated. Should update tests method invocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For
[jira] [Created] (SPARK-16233) test_sparkSQL.R is failing
Xin Ren created SPARK-16233: --- Summary: test_sparkSQL.R is failing Key: SPARK-16233 URL: https://issues.apache.org/jira/browse/SPARK-16233 Project: Spark Issue Type: Bug Components: SparkR, Tests Affects Versions: 2.0.0 Reporter: Xin Ren Priority: Minor By running {code} ./R/run-tests.sh {code} Getting error: {code} 15. Error: create DataFrame from list or data.frame (@test_sparkSQL.R#277) - java.lang.NoClassDefFoundorg/apache/spark/sql/execution/datasources/PreInsertCastAndRename$ at org.apache.spark.sql.hive.HiveSessionState$$anon$1.(HiveSessionState.scala:69) at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63) at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:533) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:293) at org.apache.spark.sql.api.r.SQLUtils$.createDF(SQLUtils.scala:135) at org.apache.spark.sql.api.r.SQLUtils.createDF(SQLUtils.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) 1: createDataFrame(l, c("a", "b")) at /Users/quickmobile/workspace/spark/R/lib/SparkR/tests/testthat/test_sparkSQL.R:277 2: dispatchFunc("createDataFrame(data, schema = NULL, samplingRatio = 1.0)", x, ...) 3: f(x, ...) 4: callJStatic("org.apache.spark.sql.api.r.SQLUtils", "createDF", srdd, schema$jobj, sparkSession) 5: invokeJava(isStatic = TRUE, className, methodName, ...) 6: stop(readString(conn)) DONE === Execution halted {code} Cause: most probably these tests are using 'createDataFrame(sqlContext...)' which is deprecated. Should update tests method invocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16140) Group k-means method in generated doc
[ https://issues.apache.org/jira/browse/SPARK-16140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346776#comment-15346776 ] Xin Ren commented on SPARK-16140: - OK, I'll target to finish it this weekend. Thanks for the tips, I'll keep it concise and clean. > Group k-means method in generated doc > - > > Key: SPARK-16140 > URL: https://issues.apache.org/jira/browse/SPARK-16140 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib, SparkR >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Xin Ren > Labels: starter > > Follow SPARK-16107 and group the doc of spark.kmeans, predict(KM), > summary(KM), read/write.ml(KM) under Rd spark.kmeans. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16140) Group k-means method in generated doc
[ https://issues.apache.org/jira/browse/SPARK-16140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345254#comment-15345254 ] Xin Ren commented on SPARK-16140: - Maybe I can take this one as a warm-up, thanks Xiangrui :) > Group k-means method in generated doc > - > > Key: SPARK-16140 > URL: https://issues.apache.org/jira/browse/SPARK-16140 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib, SparkR >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng > Labels: starter > > Follow SPARK-16107 and group the doc of spark.kmeans, predict(KM), > summary(KM), read/write.ml(KM) under Rd spark.kmeans. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16147) Add package docs to packages under spark.ml
[ https://issues.apache.org/jira/browse/SPARK-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345246#comment-15345246 ] Xin Ren commented on SPARK-16147: - hi Xiangrui, I can help on this if you need more hands. :) > Add package docs to packages under spark.ml > --- > > Key: SPARK-16147 > URL: https://issues.apache.org/jira/browse/SPARK-16147 > Project: Spark > Issue Type: Documentation > Components: Documentation, MLlib >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > Some packages do not have package docs. It would improve the documentation if > we write a short summary for each package. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15829) spark master webpage links to application UI broke when running in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-15829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325073#comment-15325073 ] Xin Ren commented on SPARK-15829: - sorry Andy, my bad. I'm running on port 7077 and client mode. > spark master webpage links to application UI broke when running in cluster > mode > --- > > Key: SPARK-15829 > URL: https://issues.apache.org/jira/browse/SPARK-15829 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.6.1 > Environment: AWS ec2 cluster >Reporter: Andrew Davidson >Priority: Critical > > Hi > I created a cluster using the spark-1.6.1-bin-hadoop2.6/ec2/spark-ec2 > I use the stand alone cluster manager. I have a streaming app running in > cluster mode. I notice the master webpage links to the application UI page > are incorrect > It does not look like jira will let my upload images. I'll try and describe > the web pages and the bug > My master is running on > http://ec2-54-215-230-73.us-west-1.compute.amazonaws.com:8080/ > It has a section marked "applications". If I click on one of the running > application ids I am taken to a page showing "Executor Summary". This page > has a link to teh 'application detail UI' the url is > http://ec2-54-215-230-73.us-west-1.compute.amazonaws.com:4041/ > Notice it things the application UI is running on the cluster master. > It is actually running on the same machine as the driver on port 4041. I was > able to reverse engine the url by noticing the private ip address is part of > the worker id . For example worker-20160322041632-172.31.23.201-34909 > next I went on the aws ec2 console to find the public DNS name for this > machine > http://ec2-54-193-104-169.us-west-1.compute.amazonaws.com:4041/streaming/ > Kind regards > Andy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15829) spark master webpage links to application UI broke when running in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-15829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324789#comment-15324789 ] Xin Ren commented on SPARK-15829: - Hi Andy, maybe you want to check your port configuration to make sure the port is not in use. I just tried it on my cluster which is also on EC2 with v-1.6.1, and 'application detail UI' link is working properly. Just for your information. > spark master webpage links to application UI broke when running in cluster > mode > --- > > Key: SPARK-15829 > URL: https://issues.apache.org/jira/browse/SPARK-15829 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.6.1 > Environment: AWS ec2 cluster >Reporter: Andrew Davidson >Priority: Critical > > Hi > I created a cluster using the spark-1.6.1-bin-hadoop2.6/ec2/spark-ec2 > I use the stand alone cluster manager. I have a streaming app running in > cluster mode. I notice the master webpage links to the application UI page > are incorrect > It does not look like jira will let my upload images. I'll try and describe > the web pages and the bug > My master is running on > http://ec2-54-215-230-73.us-west-1.compute.amazonaws.com:8080/ > It has a section marked "applications". If I click on one of the running > application ids I am taken to a page showing "Executor Summary". This page > has a link to teh 'application detail UI' the url is > http://ec2-54-215-230-73.us-west-1.compute.amazonaws.com:4041/ > Notice it things the application UI is running on the cluster master. > It is actually running on the same machine as the driver on port 4041. I was > able to reverse engine the url by noticing the private ip address is part of > the worker id . For example worker-20160322041632-172.31.23.201-34909 > next I went on the aws ec2 console to find the public DNS name for this > machine > http://ec2-54-193-104-169.us-west-1.compute.amazonaws.com:4041/streaming/ > Kind regards > Andy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"
[ https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-15509: Description: Currently in SparkR, when you load a LibSVM dataset using the sqlContext and then pass it to an MLlib algorithm, the ML wrappers will fail since they will try to create a "features" column, which conflicts with the existing "features" column from the LibSVM loader. E.g., using the "mnist" dataset from LibSVM: {code} training <- loadDF(sqlContext, ".../mnist", "libsvm") model <- spark.naiveBayes(label ~ features, training) {code} This fails with: {code} 16/05/24 11:52:41 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : java.lang.IllegalArgumentException: Output column features already exists. at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca {code} The same issue appears for the "label" column once you rename the "features" column. was: Currently in SparkR, when you load a LibSVM dataset using the sqlContext and then pass it to an MLlib algorithm, the ML wrappers will fail since they will try to create a "features" column, which conflicts with the existing "features" column from the LibSVM loader. E.g., using the "mnist" dataset from LibSVM: {code} training <- loadDF(sqlContext, ".../mnist", "libsvm") model <- naiveBayes(label ~ features, training) {code} This fails with: {code} 16/05/24 11:52:41 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : java.lang.IllegalArgumentException: Output column features already exists. at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca {code} The same issue appears for the "label" column once you rename the "features" column. > R MLlib algorithms should support input columns "features" and "label" > -- > > Key: SPARK-15509 > URL: https://issues.apache.org/jira/browse/SPARK-15509 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Reporter: Joseph K. Bradley > > Currently in SparkR, when you load a LibSVM dataset using the sqlContext and > then pass it to an MLlib algorithm, the ML wrappers will fail since they will > try to create a "features" column, which conflicts with the existing > "features" column from the LibSVM loader. E.g., using the "mnist" dataset > from LibSVM: > {code} > training <- loadDF(sqlContext, ".../mnist", "libsvm") > model <- spark.naiveBayes(label ~ features, training) > {code} > This fails with: > {code} > 16/05/24 11:52:41 ERROR RBackendHandler: fit on > org.apache.spark.ml.r.NaiveBayesWrapper failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.IllegalArgumentException: Output column features already exists. > at >
[jira] [Updated] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"
[ https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-15509: Description: Currently in SparkR, when you load a LibSVM dataset using the sqlContext and then pass it to an MLlib algorithm, the ML wrappers will fail since they will try to create a "features" column, which conflicts with the existing "features" column from the LibSVM loader. E.g., using the "mnist" dataset from LibSVM: {code} training <- loadDF(sqlContext, ".../mnist", "libsvm") model <- naiveBayes(label ~ features, training) {code} This fails with: {code} 16/05/24 11:52:41 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : java.lang.IllegalArgumentException: Output column features already exists. at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca {code} The same issue appears for the "label" column once you rename the "features" column. was: Currently in SparkR, when you load a LibSVM dataset using the sqlContext and then pass it to an MLlib algorithm, the ML wrappers will fail since they will try to create a "features" column, which conflicts with the existing "features" column from the LibSVM loader. E.g., using the "mnist" dataset from LibSVM: {code} training <- loadDF(sqlContext, ".../mnist", "libsvm") model <- spark.naiveBayes(label ~ features, training) {code} This fails with: {code} 16/05/24 11:52:41 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : java.lang.IllegalArgumentException: Output column features already exists. at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca {code} The same issue appears for the "label" column once you rename the "features" column. > R MLlib algorithms should support input columns "features" and "label" > -- > > Key: SPARK-15509 > URL: https://issues.apache.org/jira/browse/SPARK-15509 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Reporter: Joseph K. Bradley > > Currently in SparkR, when you load a LibSVM dataset using the sqlContext and > then pass it to an MLlib algorithm, the ML wrappers will fail since they will > try to create a "features" column, which conflicts with the existing > "features" column from the LibSVM loader. E.g., using the "mnist" dataset > from LibSVM: > {code} > training <- loadDF(sqlContext, ".../mnist", "libsvm") > model <- naiveBayes(label ~ features, training) > {code} > This fails with: > {code} > 16/05/24 11:52:41 ERROR RBackendHandler: fit on > org.apache.spark.ml.r.NaiveBayesWrapper failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.IllegalArgumentException: Output column features already exists. > at >
[jira] [Commented] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"
[ https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306977#comment-15306977 ] Xin Ren commented on SPARK-15509: - I can reproduce the error here now, sorry for bothering Joseph > R MLlib algorithms should support input columns "features" and "label" > -- > > Key: SPARK-15509 > URL: https://issues.apache.org/jira/browse/SPARK-15509 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Reporter: Joseph K. Bradley > > Currently in SparkR, when you load a LibSVM dataset using the sqlContext and > then pass it to an MLlib algorithm, the ML wrappers will fail since they will > try to create a "features" column, which conflicts with the existing > "features" column from the LibSVM loader. E.g., using the "mnist" dataset > from LibSVM: > {code} > training <- loadDF(sqlContext, ".../mnist", "libsvm") > model <- naiveBayes(label ~ features, training) > {code} > This fails with: > {code} > 16/05/24 11:52:41 ERROR RBackendHandler: fit on > org.apache.spark.ml.r.NaiveBayesWrapper failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.IllegalArgumentException: Output column features already exists. > at > org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) > at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) > at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) > at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) > at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) > at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) > at > org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) > at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca > {code} > The same issue appears for the "label" column once you rename the "features" > column. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15645) Fix some typos of Streaming module
[ https://issues.apache.org/jira/browse/SPARK-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306092#comment-15306092 ] Xin Ren commented on SPARK-15645: - Thank you very much for this explanation Sean, I'll try to avoid this kind of JIRA in the future. > Fix some typos of Streaming module > -- > > Key: SPARK-15645 > URL: https://issues.apache.org/jira/browse/SPARK-15645 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 2.0.0 >Reporter: Xin Ren >Priority: Trivial > > No code change, just some typo fixing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15645) Fix some typos of Streaming module
[ https://issues.apache.org/jira/browse/SPARK-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305971#comment-15305971 ] Xin Ren edited comment on SPARK-15645 at 5/29/16 4:21 PM: -- sorry...in this case what should be done for some very trivial things? Just open PR without a JIRA ticket? was (Author: iamshrek): sorry...in this case what should be done for some very trivial things? > Fix some typos of Streaming module > -- > > Key: SPARK-15645 > URL: https://issues.apache.org/jira/browse/SPARK-15645 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 2.0.0 >Reporter: Xin Ren >Priority: Trivial > > No code change, just some typo fixing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15645) Fix some typos of Streaming module
[ https://issues.apache.org/jira/browse/SPARK-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305971#comment-15305971 ] Xin Ren commented on SPARK-15645: - sorry...in this case what should be done for some very trivial things? > Fix some typos of Streaming module > -- > > Key: SPARK-15645 > URL: https://issues.apache.org/jira/browse/SPARK-15645 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 2.0.0 >Reporter: Xin Ren >Priority: Trivial > > No code change, just some typo fixing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15645) Fix some typos of Streaming module
Xin Ren created SPARK-15645: --- Summary: Fix some typos of Streaming module Key: SPARK-15645 URL: https://issues.apache.org/jira/browse/SPARK-15645 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 2.0.0 Reporter: Xin Ren Priority: Trivial No code change, just some typo fixing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"
[ https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304777#comment-15304777 ] Xin Ren commented on SPARK-15509: - Hi [~josephkb], I tried many times but cannot reproduce your error message here. I tried R naiveBayes package and also spark.naiveBayes, but both got {code} naiveBayes formula interface handles data frames or arrays only {code} below is what I did: {code} ./bin/sparkR --master "local[2]" > training <- loadDF(sqlContext, "data/mllib/sample_libsvm_data.txt", "libsvm") > model <- spark.naiveBayes(label ~ features, training) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘spark.naiveBayes’ for signature ‘"formula", "SparkDataFrame"’ > model <- naiveBayes(label ~ features, training) Error in naiveBayes.formula(label ~ features, training) : naiveBayes formula interface handles data frames or arrays only {code} then I tried example here and it's working http://spark.apache.org/docs/latest/sparkr.html#gaussian-glm-model {code} df <- createDataFrame(sqlContext, iris) model <- glm(Sepal_Length ~ Sepal_Width + Species, data = df, family = "gaussian") {code} so I compare these 2 examples, and features are 'vector' type and df above is normal columns. {code} > df SparkDataFrame[Sepal_Length:double, Sepal_Width:double, Petal_Length:double, Petal_Width:double, Species:string] > training SparkDataFrame[label:double, features:vector] {code} I also downloaded "mnist" dataset LibSVM, and same error. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#mnist Is there anything I'm doing wrong? I'm using R package of naiveBayes (http://www.inside-r.org/packages/cran/e1071/docs/naivebayes), maybe I'm using the wrong package? Thank you very much Joseph. > R MLlib algorithms should support input columns "features" and "label" > -- > > Key: SPARK-15509 > URL: https://issues.apache.org/jira/browse/SPARK-15509 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Reporter: Joseph K. Bradley > > Currently in SparkR, when you load a LibSVM dataset using the sqlContext and > then pass it to an MLlib algorithm, the ML wrappers will fail since they will > try to create a "features" column, which conflicts with the existing > "features" column from the LibSVM loader. E.g., using the "mnist" dataset > from LibSVM: > {code} > training <- loadDF(sqlContext, ".../mnist", "libsvm") > model <- naiveBayes(label ~ features, training) > {code} > This fails with: > {code} > 16/05/24 11:52:41 ERROR RBackendHandler: fit on > org.apache.spark.ml.r.NaiveBayesWrapper failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.IllegalArgumentException: Output column features already exists. > at > org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) > at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) > at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) > at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) > at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) > at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) > at > org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) > at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca > {code} > The same issue appears for the "label" column once you rename the "features" > column. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"
[ https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302932#comment-15302932 ] Xin Ren commented on SPARK-15509: - Sure I'll try to finish by end of this week, thanks Joseph > R MLlib algorithms should support input columns "features" and "label" > -- > > Key: SPARK-15509 > URL: https://issues.apache.org/jira/browse/SPARK-15509 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Reporter: Joseph K. Bradley > > Currently in SparkR, when you load a LibSVM dataset using the sqlContext and > then pass it to an MLlib algorithm, the ML wrappers will fail since they will > try to create a "features" column, which conflicts with the existing > "features" column from the LibSVM loader. E.g., using the "mnist" dataset > from LibSVM: > {code} > training <- loadDF(sqlContext, ".../mnist", "libsvm") > model <- naiveBayes(label ~ features, training) > {code} > This fails with: > {code} > 16/05/24 11:52:41 ERROR RBackendHandler: fit on > org.apache.spark.ml.r.NaiveBayesWrapper failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.IllegalArgumentException: Output column features already exists. > at > org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) > at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) > at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) > at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) > at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) > at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) > at > org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) > at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca > {code} > The same issue appears for the "label" column once you rename the "features" > column. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15542) Make error message clear for script './R/install-dev.sh' when R is missing on Mac
[ https://issues.apache.org/jira/browse/SPARK-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-15542: Description: I followed instructions here https://github.com/apache/spark/tree/master/R to build sparkR project. When running {code}build/mvn -DskipTests -Psparkr package{code} then I got error below: {code} [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 23.589 s] [INFO] Spark Project Tags . SUCCESS [ 19.389 s] #!/bin/bash [INFO] Spark Project Sketch ... SUCCESS [ 6.386 s] [INFO] Spark Project Networking ... SUCCESS [ 12.296 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 7.817 s] [INFO] Spark Project Unsafe ... SUCCESS [ 10.825 s] [INFO] Spark Project Launcher . SUCCESS [ 12.262 s] [INFO] Spark Project Core . FAILURE [01:40 min] [INFO] Spark Project GraphX ... SKIPPED [INFO] Spark Project Streaming SKIPPED [INFO] Spark Project Catalyst . SKIPPED [INFO] Spark Project SQL .. SKIPPED [INFO] Spark Project ML Local Library . SKIPPED [INFO] Spark Project ML Library ... SKIPPED [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External Flume Assembly .. SKIPPED [INFO] Spark Integration for Kafka 0.8 SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project External Kafka Assembly .. SKIPPED [INFO] Spark Project Java 8 Tests . SKIPPED [INFO] [INFO] BUILD FAILURE #!/bin/bash [INFO] [INFO] Total time: 03:14 min [INFO] Finished at: 2016-05-25T21:51:58+00:00 [INFO] Final Memory: 55M/782M [INFO] [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (sparkr-pkg) on project spark-core_2.11: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :spark-core_2.11 {code} and this error turned to be caused by {code}./R/install-dev.sh{code} then I directly run this install-dev.sh script, and got {code} mbp185-xr:spark xin$ ./R/install-dev.sh usage: dirname path {code} This message is very confusing to me, and then I found R is not properly configured on my Mac when this script is using {code}$(which R){code} to get R home. I tried similar situation on CentOS with R missing, and it's giving me very clear error message while MacOS is not. on CentOS: {code} [root@ip-xxx-31-9-xx spark]# which R /usr/bin/which: no R in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin:/root/bin){code} but on Mac, if not found then nothing returned and this is causing the confusing message for R build failure and running R/install-dev.sh: {code} mbp185-xr:spark xin$ which R mbp185-xr:spark xin$ {code} So a more clear message needed for this miss configuration for R when running R/install-dev.sh. was: I followed instructions here https://github.com/apache/spark/tree/master/R to build sparkR project. When running {code}build/mvn -DskipTests -Psparkr package{code} then I got error below: {code} [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 23.589 s] [INFO] Spark Project Tags . SUCCESS [ 19.389 s] #!/bin/bash [INFO] Spark Project Sketch ... SUCCESS [ 6.386 s] [INFO] Spark Project Networking
[jira] [Created] (SPARK-15542) Make error message clear for script './R/install-dev.sh' when R is missing on Mac
Xin Ren created SPARK-15542: --- Summary: Make error message clear for script './R/install-dev.sh' when R is missing on Mac Key: SPARK-15542 URL: https://issues.apache.org/jira/browse/SPARK-15542 Project: Spark Issue Type: Improvement Components: SparkR Affects Versions: 2.0.0 Environment: Mac OS EI Captain Reporter: Xin Ren Priority: Minor I followed instructions here https://github.com/apache/spark/tree/master/R to build sparkR project. When running {code}build/mvn -DskipTests -Psparkr package{code} then I got error below: {code} [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 23.589 s] [INFO] Spark Project Tags . SUCCESS [ 19.389 s] #!/bin/bash [INFO] Spark Project Sketch ... SUCCESS [ 6.386 s] [INFO] Spark Project Networking ... SUCCESS [ 12.296 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 7.817 s] [INFO] Spark Project Unsafe ... SUCCESS [ 10.825 s] [INFO] Spark Project Launcher . SUCCESS [ 12.262 s] [INFO] Spark Project Core . FAILURE [01:40 min] [INFO] Spark Project GraphX ... SKIPPED [INFO] Spark Project Streaming SKIPPED [INFO] Spark Project Catalyst . SKIPPED [INFO] Spark Project SQL .. SKIPPED [INFO] Spark Project ML Local Library . SKIPPED [INFO] Spark Project ML Library ... SKIPPED [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External Flume Assembly .. SKIPPED [INFO] Spark Integration for Kafka 0.8 SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project External Kafka Assembly .. SKIPPED [INFO] Spark Project Java 8 Tests . SKIPPED [INFO] [INFO] BUILD FAILURE #!/bin/bash [INFO] [INFO] Total time: 03:14 min [INFO] Finished at: 2016-05-25T21:51:58+00:00 [INFO] Final Memory: 55M/782M [INFO] [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (sparkr-pkg) on project spark-core_2.11: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :spark-core_2.11 {code} and this error turned to be caused by {code}./R/install-dev.sh{code} then I directly run this install-dev.sh script, and got {code} mbp185-xr:spark quickmobile$ ./R/install-dev.sh usage: dirname path {code} This message is very confusing to me, and then I found R is not properly configured on my Mac when this script is using {code}$(which R){code} to get R home. I tried similar situation on CentOS with R missing, and it's giving me very clear error message while MacOS is not. on CentOS: {code} [root@ip-xxx-31-9-xx spark]# which R /usr/bin/which: no R in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin:/root/bin){code} but on Mac, if not found then nothing returned and this is causing the confusing message for R build failure and running R/install-dev.sh: {code} mbp185-xr:spark xin$ which R mbp185-xr:spark xin$ {code} So a more clear message needed for this miss configuration for R when running R/install-dev.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"
[ https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298787#comment-15298787 ] Xin Ren commented on SPARK-15509: - Hi Joseph, I'd like to try to fix this one. Thanks a lot :) > R MLlib algorithms should support input columns "features" and "label" > -- > > Key: SPARK-15509 > URL: https://issues.apache.org/jira/browse/SPARK-15509 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley > > Currently in SparkR, when you load a LibSVM dataset using the sqlContext and > then pass it to an MLlib algorithm, the ML wrappers will fail since they will > try to create a "features" column, which conflicts with the existing > "features" column from the LibSVM loader. E.g., using the "mnist" dataset > from LibSVM: > {code} > training <- loadDF(sqlContext, ".../mnist", "libsvm") > model <- naiveBayes(label ~ features, training) > {code} > This fails with: > {code} > 16/05/24 11:52:41 ERROR RBackendHandler: fit on > org.apache.spark.ml.r.NaiveBayesWrapper failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.IllegalArgumentException: Output column features already exists. > at > org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) > at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) > at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) > at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) > at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) > at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) > at > org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) > at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca > {code} > The same issue appears for the "label" column once you rename the "features" > column. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15130) PySpark decision tree params should include default values to match Scala
[ https://issues.apache.org/jira/browse/SPARK-15130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271531#comment-15271531 ] Xin Ren commented on SPARK-15130: - Hi I just found in class DecisionTreeClassifier of pyspark, there is a setParams method which sort of matches what is in scala ones. do you mean to create a separate class "Param"? {code} @keyword_only @since("1.4.0") def setParams(self, featuresCol="features", labelCol="label", predictionCol="prediction", probabilityCol="probability", rawPredictionCol="rawPrediction", maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, impurity="gini", seed=None): """ setParams(self, featuresCol="features", labelCol="label", predictionCol="prediction", \ probabilityCol="probability", rawPredictionCol="rawPrediction", \ maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, \ maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, impurity="gini", \ seed=None) Sets params for the DecisionTreeClassifier. """ kwargs = self.setParams._input_kwargs return self._set(**kwargs) {code} > PySpark decision tree params should include default values to match Scala > - > > Key: SPARK-15130 > URL: https://issues.apache.org/jira/browse/SPARK-15130 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML, PySpark >Reporter: holdenk >Priority: Minor > > As part of checking the documentation in SPARK-14813, PySpark decision tree > params do not include the default values (unlike the Scala ones). While the > existing Scala default values will have been used, this information is likely > worth exposing in the docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14817) ML, Graph, R 2.0 QA: Programming guide update and migration guide
[ https://issues.apache.org/jira/browse/SPARK-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269775#comment-15269775 ] Xin Ren commented on SPARK-14817: - ok, I'll start looking for new APIs. So just create new tickets under SPARK-14815? > ML, Graph, R 2.0 QA: Programming guide update and migration guide > - > > Key: SPARK-14817 > URL: https://issues.apache.org/jira/browse/SPARK-14817 > Project: Spark > Issue Type: Sub-task > Components: Documentation, GraphX, ML, MLlib, SparkR >Reporter: Joseph K. Bradley > > Before the release, we need to update the MLlib, GraphX, and SparkR > Programming Guides. Updates will include: > * Add migration guide subsection. > ** Use the results of the QA audit JIRAs and [SPARK-13448]. > * Check phrasing, especially in main sections (for outdated items such as "In > this release, ...") > For MLlib, we will make the DataFrame-based API (spark.ml) front-and-center, > to make it clear the RDD-based API is the older, maintenance-mode one. > * No docs for spark.mllib will be deleted; they will just be reorganized and > put in a subsection. > * If spark.ml docs are less complete, or if spark.ml docs say "refer to the > spark.mllib docs for details," then we should copy those details to the > spark.ml docs. This per-feature work can happen under [SPARK-14815]. > * This big reorganization should be done *after* docs are added for each > feature (to minimize merge conflicts). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14936) FlumePollingStreamSuite is slow
[ https://issues.apache.org/jira/browse/SPARK-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263129#comment-15263129 ] Xin Ren commented on SPARK-14936: - I'm trying to fix this one now :) > FlumePollingStreamSuite is slow > --- > > Key: SPARK-14936 > URL: https://issues.apache.org/jira/browse/SPARK-14936 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Reporter: Josh Rosen > > FlumePollingStreamSuite contains two tests which run for a minute each. This > seems excessively slow and we should speed it up if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14935) DistributedSuite "local-cluster format" shouldn't actually launch clusters
[ https://issues.apache.org/jira/browse/SPARK-14935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260648#comment-15260648 ] Xin Ren commented on SPARK-14935: - I'd like to have a try on this one, thanks a lot :) > DistributedSuite "local-cluster format" shouldn't actually launch clusters > -- > > Key: SPARK-14935 > URL: https://issues.apache.org/jira/browse/SPARK-14935 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Reporter: Josh Rosen > Labels: starter > > In DistributedSuite, the "local-cluster format" test actually launches a > bunch of clusters, but this doesn't seem necessary for what should just be a > unit test of a regex. We should clean up the code so that this is testable > without actually launching a cluster, which should buy us about 20 seconds > per build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14817) ML 2.0 QA: Programming guide update and migration guide
[ https://issues.apache.org/jira/browse/SPARK-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254371#comment-15254371 ] Xin Ren commented on SPARK-14817: - cout me too :) > ML 2.0 QA: Programming guide update and migration guide > --- > > Key: SPARK-14817 > URL: https://issues.apache.org/jira/browse/SPARK-14817 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, MLlib >Reporter: Joseph K. Bradley > > Before the release, we need to update the MLlib Programming Guide. Updates > will include: > * Make the DataFrame-based API (spark.ml) front-and-center, to make it clear > the RDD-based API is the older, maintenance-mode one. > ** No docs for spark.mllib will be deleted; they will just be reorganized and > put in a subsection. > ** If spark.ml docs are less complete, or if spark.ml docs say "refer to the > spark.mllib docs for details," then we should copy those details to the > spark.ml docs. > * Add migration guide subsection. > ** Use the results of the QA audit JIRAs. > * Check phrasing, especially in main sections (for outdated items such as "In > this release, ...") > If you would like to work on this task, please comment, and we can create & > link JIRAs for parts of this work (which should be broken into pieces for > this larger 2.0 release). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14569) Log instrumentation in KMeans
[ https://issues.apache.org/jira/browse/SPARK-14569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243228#comment-15243228 ] Xin Ren commented on SPARK-14569: - Hi I'd like to have a try on this one, thanks a lot :) > Log instrumentation in KMeans > - > > Key: SPARK-14569 > URL: https://issues.apache.org/jira/browse/SPARK-14569 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Timothy Hunter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14300) Scala MLlib examples code merge and clean up
[ https://issues.apache.org/jira/browse/SPARK-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220206#comment-15220206 ] Xin Ren commented on SPARK-14300: - ok > Scala MLlib examples code merge and clean up > > > Key: SPARK-14300 > URL: https://issues.apache.org/jira/browse/SPARK-14300 > Project: Spark > Issue Type: Sub-task > Components: Examples >Reporter: Xusen Yin >Priority: Minor > Labels: starter > > Duplicated code that I found in scala/examples/mllib: > * scala/mllib > ** DecisionTreeRunner.scala > ** DenseGaussianMixture.scala > ** DenseKMeans.scala > ** GradientBoostedTreesRunner.scala > ** LDAExample.scala > ** LinearRegression.scala > ** SparseNaiveBayes.scala > ** StreamingLinearRegression.scala > ** StreamingLogisticRegression.scala > ** TallSkinnyPCA.scala > ** TallSkinnySVD.scala > * Unsure code duplications (need doube check) > ** AbstractParams.scala > ** BinaryClassification.scala > ** Correlations.scala > ** CosineSimilarity.scala > ** DenseGaussianMixture.scala > ** FPGrowthExample.scala > ** MovieLensALS.scala > ** MultivariateSummarizer.scala > ** RandomRDDGeneration.scala > ** SampledRDDs.scala > When merging and cleaning those code, be sure not disturb the previous > example on and off blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14300) Scala MLlib examples code merge and clean up
[ https://issues.apache.org/jira/browse/SPARK-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220178#comment-15220178 ] Xin Ren commented on SPARK-14300: - Hi Xusen, I can work on this one, thanks a lot :) > Scala MLlib examples code merge and clean up > > > Key: SPARK-14300 > URL: https://issues.apache.org/jira/browse/SPARK-14300 > Project: Spark > Issue Type: Sub-task > Components: Examples >Reporter: Xusen Yin >Priority: Minor > Labels: starter > > Duplicated code that I found in scala/examples/mllib: > * scala/mllib > ** DecisionTreeRunner.scala > ** DenseGaussianMixture.scala > ** DenseKMeans.scala > ** GradientBoostedTreesRunner.scala > ** LDAExample.scala > ** LinearRegression.scala > ** SparseNaiveBayes.scala > ** StreamingLinearRegression.scala > ** StreamingLogisticRegression.scala > ** TallSkinnyPCA.scala > ** TallSkinnySVD.scala > * Unsure code duplications (need doube check) > ** AbstractParams.scala > ** BinaryClassification.scala > ** Correlations.scala > ** CosineSimilarity.scala > ** DenseGaussianMixture.scala > ** FPGrowthExample.scala > ** MovieLensALS.scala > ** MultivariateSummarizer.scala > ** RandomRDDGeneration.scala > ** SampledRDDs.scala > When merging and cleaning those code, be sure not disturb the previous > example on and off blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-13765) method specialStateTransition(int, IntStream) is exceeding the 65535 bytes limit
[ https://issues.apache.org/jira/browse/SPARK-13765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren closed SPARK-13765. --- Resolution: Not A Problem It's caused by the way I import the whole project. This error is popping up when I run "./build/sbt eclipse" and then directly import as an existing eclipse project. When I import into Eclipse as maven project then this error is gone. > method specialStateTransition(int, IntStream) is exceeding the 65535 bytes > limit > > > Key: SPARK-13765 > URL: https://issues.apache.org/jira/browse/SPARK-13765 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: Eclipse-Scala IDE > sitting on master branch >Reporter: Xin Ren > Attachments: Screen Shot 2016-03-08 at 9.52.48 PM.png > > > Eclipse-Scala IDE is complaining on Java Problem (*please see attached > screenshot*), but IntelliJ is not complaining about it. > I'm not sure if it is a bug or not. > {code} > The code of method specialStateTransition(int, IntStream) is exceeding the > 65535 bytes limit > SparkSqlParser_IdentifiersParser.java > /spark-catalyst_2.11/target/generated-sources/antlr3/org/apache/spark/sql/catalyst/parser >line 40380 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-13019) Replace example code in mllib-statistics.md using include_example
[ https://issues.apache.org/jira/browse/SPARK-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren reopened SPARK-13019: - need to fix scala-2.10 compile > Replace example code in mllib-statistics.md using include_example > - > > Key: SPARK-13019 > URL: https://issues.apache.org/jira/browse/SPARK-13019 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Reporter: Xusen Yin >Assignee: Xin Ren >Priority: Minor > Labels: starter > Fix For: 2.0.0 > > > The example code in the user guide is embedded in the markdown and hence it > is not easy to test. It would be nice to automatically test them. This JIRA > is to discuss options to automate example code testing and see what we can do > in Spark 1.6. > Goal is to move actual example code to spark/examples and test compilation in > Jenkins builds. Then in the markdown, we can reference part of the code to > show in the user guide. This requires adding a Jekyll tag that is similar to > https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, > e.g., called include_example. > {code}{% include_example > scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala %}{code} > Jekyll will find > `examples/src/main/scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala` > and pick code blocks marked "example" and replace code block in > {code}{% highlight %}{code} > in the markdown. > See more sub-tasks in parent ticket: > https://issues.apache.org/jira/browse/SPARK-11337 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13660) ContinuousQuerySuite floods the logs with garbage
[ https://issues.apache.org/jira/browse/SPARK-13660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194085#comment-15194085 ] Xin Ren commented on SPARK-13660: - Thank you Shixiong > ContinuousQuerySuite floods the logs with garbage > - > > Key: SPARK-13660 > URL: https://issues.apache.org/jira/browse/SPARK-13660 > Project: Spark > Issue Type: Test > Components: Tests >Reporter: Shixiong Zhu > Labels: starter > > https://github.com/apache/spark/pull/11439 added a utility method > "testQuietly". We can use it for ContinuousQuerySuite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org