[jira] [Commented] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214181#comment-16214181
 ] 

Apache Spark commented on SPARK-22327:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/19550

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> WARNING: There was 1 warning.
> NOTE: There were 2 notes.
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214180#comment-16214180
 ] 

Apache Spark commented on SPARK-22327:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/19549

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> WARNING: There was 1 warning.
> NOTE: There were 2 notes.
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22327:


Assignee: (was: Apache Spark)

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> WARNING: There was 1 warning.
> NOTE: There were 2 notes.
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22327:


Assignee: Apache Spark

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>Assignee: Apache Spark
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> WARNING: There was 1 warning.
> NOTE: There were 2 notes.
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22303) [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE

2017-10-21 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22303:


Assignee: (was: Apache Spark)

> [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE
> ---
>
> Key: SPARK-22303
> URL: https://issues.apache.org/jira/browse/SPARK-22303
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kohki Nishio
>Priority: Minor
>
> When a table contains columns such as BINARY_DOUBLE or BINARY_FLOAT, this 
> JDBC connector throws SQL exception
> {code}
> java.sql.SQLException: Unsupported type 101
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:235)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:291)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:64)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
> {code}
> these types are Oracle specific ones, described here
> https://docs.oracle.com/cd/E11882_01/timesten.112/e21642/types.htm#TTSQL148



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22303) [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE

2017-10-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214164#comment-16214164
 ] 

Apache Spark commented on SPARK-22303:
--

User 'taroplus' has created a pull request for this issue:
https://github.com/apache/spark/pull/19548

> [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE
> ---
>
> Key: SPARK-22303
> URL: https://issues.apache.org/jira/browse/SPARK-22303
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kohki Nishio
>Priority: Minor
>
> When a table contains columns such as BINARY_DOUBLE or BINARY_FLOAT, this 
> JDBC connector throws SQL exception
> {code}
> java.sql.SQLException: Unsupported type 101
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:235)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:291)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:64)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
> {code}
> these types are Oracle specific ones, described here
> https://docs.oracle.com/cd/E11882_01/timesten.112/e21642/types.htm#TTSQL148



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22303) [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE

2017-10-21 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22303:


Assignee: Apache Spark

> [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE
> ---
>
> Key: SPARK-22303
> URL: https://issues.apache.org/jira/browse/SPARK-22303
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kohki Nishio
>Assignee: Apache Spark
>Priority: Minor
>
> When a table contains columns such as BINARY_DOUBLE or BINARY_FLOAT, this 
> JDBC connector throws SQL exception
> {code}
> java.sql.SQLException: Unsupported type 101
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:235)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:291)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:64)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
> {code}
> these types are Oracle specific ones, described here
> https://docs.oracle.com/cd/E11882_01/timesten.112/e21642/types.htm#TTSQL148



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22303) [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE

2017-10-21 Thread Kohki Nishio (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214163#comment-16214163
 ] 

Kohki Nishio commented on SPARK-22303:
--

https://github.com/apache/spark/pull/19548

> [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE
> ---
>
> Key: SPARK-22303
> URL: https://issues.apache.org/jira/browse/SPARK-22303
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kohki Nishio
>Priority: Minor
>
> When a table contains columns such as BINARY_DOUBLE or BINARY_FLOAT, this 
> JDBC connector throws SQL exception
> {code}
> java.sql.SQLException: Unsupported type 101
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:235)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:291)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:64)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
> {code}
> these types are Oracle specific ones, described here
> https://docs.oracle.com/cd/E11882_01/timesten.112/e21642/types.htm#TTSQL148



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-22303) [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE

2017-10-21 Thread Kohki Nishio (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kohki Nishio reopened SPARK-22303:
--

Spark supports Oracle specific types, I'm working on a PR, please keep this open

> [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE
> ---
>
> Key: SPARK-22303
> URL: https://issues.apache.org/jira/browse/SPARK-22303
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kohki Nishio
>Priority: Minor
>
> When a table contains columns such as BINARY_DOUBLE or BINARY_FLOAT, this 
> JDBC connector throws SQL exception
> {code}
> java.sql.SQLException: Unsupported type 101
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:235)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:291)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:64)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
> {code}
> these types are Oracle specific ones, described here
> https://docs.oracle.com/cd/E11882_01/timesten.112/e21642/types.htm#TTSQL148



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21657) Spark has exponential time complexity to explode(array of structs)

2017-10-21 Thread Ruslan Dautkhanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruslan Dautkhanov updated SPARK-21657:
--
Affects Version/s: 2.3.0
   Issue Type: Bug  (was: Improvement)

> Spark has exponential time complexity to explode(array of structs)
> --
>
> Key: SPARK-21657
> URL: https://issues.apache.org/jira/browse/SPARK-21657
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.0.0, 2.1.0, 2.1.1, 2.2.0, 2.3.0
>Reporter: Ruslan Dautkhanov
>  Labels: cache, caching, collections, nested_types, performance, 
> pyspark, sparksql, sql
> Attachments: ExponentialTimeGrowth.PNG, 
> nested-data-generator-and-test.py
>
>
> It can take up to half a day to explode a modest-sized nested collection 
> (0.5m).
> On a recent Xeon processors.
> See attached pyspark script that reproduces this problem.
> {code}
> cached_df = sqlc.sql('select individ, hholdid, explode(amft) from ' + 
> table_name).cache()
> print sqlc.count()
> {code}
> This script generate a number of tables, with the same total number of 
> records across all nested collection (see `scaling` variable in loops). 
> `scaling` variable scales up how many nested elements in each record, but by 
> the same factor scales down number of records in the table. So total number 
> of records stays the same.
> Time grows exponentially (notice log-10 vertical axis scale):
> !ExponentialTimeGrowth.PNG!
> At scaling of 50,000 (see attached pyspark script), it took 7 hours to 
> explode the nested collections (\!) of 8k records.
> After 1000 elements in nested collection, time grows exponentially.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22302) Remove manual backports for subprocess.check_output and check_call

2017-10-21 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-22302:


Assignee: Hyukjin Kwon

> Remove manual backports for subprocess.check_output and check_call
> --
>
> Key: SPARK-22302
> URL: https://issues.apache.org/jira/browse/SPARK-22302
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Trivial
> Fix For: 2.3.0
>
>
> This JIRA is loosely related with SPARK-21573. Python 2.6 could be used in 
> Jenkins given the past cases and investigations up to my knowledge and it 
> looks failing to execute some other scripts.
> In this particular case, it was:
> {code}
> cd dev && python2.6
> {code}
> {code}
> >>> from sparktestsupport import shellutils
> >>> shellutils.subprocess_check_call("ls")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "sparktestsupport/shellutils.py", line 46, in subprocess_check_call
> retcode = call(*popenargs, **kwargs)
> NameError: global name 'call' is not defined
> {code}
> Please see 
> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3950/console
> Since we dropped the Python 2.6.x support, looks better we remove those 
> workarounds and print out explicit error messages in order to duplicate the 
> efforts to find out the root causes for such cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22302) Remove manual backports for subprocess.check_output and check_call

2017-10-21 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-22302.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19524
[https://github.com/apache/spark/pull/19524]

> Remove manual backports for subprocess.check_output and check_call
> --
>
> Key: SPARK-22302
> URL: https://issues.apache.org/jira/browse/SPARK-22302
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Priority: Trivial
> Fix For: 2.3.0
>
>
> This JIRA is loosely related with SPARK-21573. Python 2.6 could be used in 
> Jenkins given the past cases and investigations up to my knowledge and it 
> looks failing to execute some other scripts.
> In this particular case, it was:
> {code}
> cd dev && python2.6
> {code}
> {code}
> >>> from sparktestsupport import shellutils
> >>> shellutils.subprocess_check_call("ls")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "sparktestsupport/shellutils.py", line 46, in subprocess_check_call
> retcode = call(*popenargs, **kwargs)
> NameError: global name 'call' is not defined
> {code}
> Please see 
> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3950/console
> Since we dropped the Python 2.6.x support, looks better we remove those 
> workarounds and print out explicit error messages in order to duplicate the 
> efforts to find out the root causes for such cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22328) ClosureCleaner misses referenced superclass fields, gives them null values

2017-10-21 Thread Ryan Williams (JIRA)
Ryan Williams created SPARK-22328:
-

 Summary: ClosureCleaner misses referenced superclass fields, gives 
them null values
 Key: SPARK-22328
 URL: https://issues.apache.org/jira/browse/SPARK-22328
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Ryan Williams


[Runnable repro here|https://github.com/ryan-williams/spark-bugs/tree/closure]:

Superclass with some fields:
{code}
abstract class App extends Serializable {
  // SparkContext stub
  @transient lazy val sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local[4]").set("spark.ui.showConsoleProgress",
 "false"))

  // These fields get missed by the ClosureCleaner in some situations
  val n1 = 111
  val s1 = "aaa"

  // Simple scaffolding to exercise passing a closure to RDD.foreach in 
subclasses
  def rdd = sc.parallelize(1 to 1)
  def run(name: String): Unit = {
print(s"$name:\t")
body()
sc.stop()
  }
  def body(): Unit
}
{code}

Running a simple Spark job with various instantiations of this class:

{code}
object Main {
  /** [[App]]s generated this way will not correctly detect references to 
[[App.n1]] in Spark closures */
  val fn = () ⇒ new App {
val n2 = 222
val s2 = "bbb"
def body(): Unit = rdd.foreach { _ ⇒ println(s"$n1, $n2, $s1, $s2") }
  }

  /** Doesn't serialize closures correctly */
  val app1 = fn()

  /** Works fine */
  val app2 =
new App {
  val n2 = 222
  val s2 = "bbb"
  def body(): Unit = rdd.foreach { _ ⇒ println(s"$n1, $n2, $s1, $s2") }
}

  /** [[App]]s created this way also work fine */
  def makeApp(): App =
new App {
  val n2 = 222
  val s2 = "bbb"
  def body(): Unit = rdd.foreach { _ ⇒ println(s"$n1, $n2, $s1, $s2") }
}

  val app3 = makeApp()  // ok

  val fn2 = () ⇒ makeApp()  // ok

  def main(args: Array[String]): Unit = {
fn().run("fn")// bad: n1 → 0, s1 → null
app1.run("app1")  // bad: n1 → 0, s1 → null
app2.run("app2")  // ok
app3.run("app3")  // ok
fn2().run("fn2")  // ok
  }
}
{code}

Build + Run:

{code}
$ sbt run
…
fn: 0, 222, null, bbb
app1:   0, 222, null, bbb
app2:   111, 222, aaa, bbb
app3:   111, 222, aaa, bbb
fn2:111, 222, aaa, bbb
{code}

The first two versions have {{0}} and {{null}}, resp., for the {{A.n1}} and 
{{A.s1}} fields.

Something about this syntax causes the problem:

{code}
() => new App { … }
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14540) Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner

2017-10-21 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213890#comment-16213890
 ] 

Sean Owen commented on SPARK-14540:
---

Thanks [~lrytz] -- by the way I confirmed that 2.12.4 does fix this particular 
issue.

I'm on to other issues in Spark with respect to the new lambda-based 
implementation of closures in Scala. For example, closures compile to functions 
with names containing "$Lambda$" rather than "$anonfun$", and some classes that 
turn up for cleaning have names that don't map to the class file that they're 
in. I've gotten through a few of these issues and may post a WIP PR for 
feedback, but haven't resolved them all.

> Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner
> 
>
> Key: SPARK-14540
> URL: https://issues.apache.org/jira/browse/SPARK-14540
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Josh Rosen
>
> Using https://github.com/JoshRosen/spark/tree/build-for-2.12, I tried running 
> ClosureCleanerSuite with Scala 2.12 and ran into two bad test failures:
> {code}
> [info] - toplevel return statements in closures are identified at cleaning 
> time *** FAILED *** (32 milliseconds)
> [info]   Expected exception 
> org.apache.spark.util.ReturnStatementInClosureException to be thrown, but no 
> exception was thrown. (ClosureCleanerSuite.scala:57)
> {code}
> and
> {code}
> [info] - user provided closures are actually cleaned *** FAILED *** (56 
> milliseconds)
> [info]   Expected ReturnStatementInClosureException, but got 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task not 
> serializable: java.io.NotSerializableException: java.lang.Object
> [info]- element of array (index: 0)
> [info]- array (class "[Ljava.lang.Object;", size: 1)
> [info]- field (class "java.lang.invoke.SerializedLambda", name: 
> "capturedArgs", type: "class [Ljava.lang.Object;")
> [info]- object (class "java.lang.invoke.SerializedLambda", 
> SerializedLambda[capturingClass=class 
> org.apache.spark.util.TestUserClosuresActuallyCleaned$, 
> functionalInterfaceMethod=scala/runtime/java8/JFunction1$mcII$sp.apply$mcII$sp:(I)I,
>  implementation=invokeStatic 
> org/apache/spark/util/TestUserClosuresActuallyCleaned$.org$apache$spark$util$TestUserClosuresActuallyCleaned$$$anonfun$69:(Ljava/lang/Object;I)I,
>  instantiatedMethodType=(I)I, numCaptured=1])
> [info]- element of array (index: 0)
> [info]- array (class "[Ljava.lang.Object;", size: 1)
> [info]- field (class "java.lang.invoke.SerializedLambda", name: 
> "capturedArgs", type: "class [Ljava.lang.Object;")
> [info]- object (class "java.lang.invoke.SerializedLambda", 
> SerializedLambda[capturingClass=class org.apache.spark.rdd.RDD, 
> functionalInterfaceMethod=scala/Function3.apply:(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;,
>  implementation=invokeStatic 
> org/apache/spark/rdd/RDD.org$apache$spark$rdd$RDD$$$anonfun$20$adapted:(Lscala/Function1;Lorg/apache/spark/TaskContext;Ljava/lang/Object;Lscala/collection/Iterator;)Lscala/collection/Iterator;,
>  
> instantiatedMethodType=(Lorg/apache/spark/TaskContext;Ljava/lang/Object;Lscala/collection/Iterator;)Lscala/collection/Iterator;,
>  numCaptured=1])
> [info]- field (class "org.apache.spark.rdd.MapPartitionsRDD", name: 
> "f", type: "interface scala.Function3")
> [info]- object (class "org.apache.spark.rdd.MapPartitionsRDD", 
> MapPartitionsRDD[2] at apply at Transformer.scala:22)
> [info]- field (class "scala.Tuple2", name: "_1", type: "class 
> java.lang.Object")
> [info]- root object (class "scala.Tuple2", (MapPartitionsRDD[2] at 
> apply at 
> Transformer.scala:22,org.apache.spark.SparkContext$$Lambda$957/431842435@6e803685)).
> [info]   This means the closure provided by user is not actually cleaned. 
> (ClosureCleanerSuite.scala:78)
> {code}
> We'll need to figure out a closure cleaning strategy which works for 2.12 
> lambdas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21551) pyspark's collect fails when getaddrinfo is too slow

2017-10-21 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-21551:
--
Fix Version/s: 2.0.3

> pyspark's collect fails when getaddrinfo is too slow
> 
>
> Key: SPARK-21551
> URL: https://issues.apache.org/jira/browse/SPARK-21551
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: peay
>Assignee: peay
>Priority: Critical
> Fix For: 2.0.3, 2.1.3, 2.2.1, 2.3.0
>
>
> Pyspark's {{RDD.collect}}, as well as {{DataFrame.toLocalIterator}} and 
> {{DataFrame.collect}} all work by starting an ephemeral server in the driver, 
> and having Python connect to it to download the data.
> All three are implemented along the lines of:
> {code}
> port = self._jdf.collectToPython()
> return list(_load_from_socket(port, BatchedSerializer(PickleSerializer(
> {code}
> The server has **a hardcoded timeout of 3 seconds** 
> (https://github.com/apache/spark/blob/e26dac5feb02033f980b1e69c9b0ff50869b6f9e/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L695)
>  -- i.e., the Python process has 3 seconds to connect to it from the very 
> moment the driver server starts.
> In general, that seems fine, but I have been encountering frequent timeouts 
> leading to `Exception: could not open socket`.
> After investigating a bit, it turns out that {{_load_from_socket}} makes a 
> call to {{getaddrinfo}}:
> {code}
> def _load_from_socket(port, serializer):
> sock = None
> # Support for both IPv4 and IPv6.
> # On most of IPv6-ready systems, IPv6 will take precedence.
> for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, 
> socket.SOCK_STREAM):
>.. connect ..
> {code}
> I am not sure why, but while most such calls to {{getaddrinfo}} on my machine 
> only take a couple milliseconds, about 10% of them take between 2 and 10 
> seconds, leading to about 10% of jobs failing. I don't think we can always 
> expect {{getaddrinfo}} to return instantly. More generally, Python may 
> sometimes pause for a couple seconds, which may not leave enough time for the 
> process to connect to the server.
> Especially since the server timeout is hardcoded, I think it would be best to 
> set a rather generous value (15 seconds?) to avoid such situations.
> A {{getaddrinfo}}  specific fix could avoid doing it every single time, or do 
> it before starting up the driver server.
>  
> cc SPARK-677 [~davies]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22303) [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE

2017-10-21 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-22303.
---
Resolution: Won't Fix

> [SQL] Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE
> ---
>
> Key: SPARK-22303
> URL: https://issues.apache.org/jira/browse/SPARK-22303
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kohki Nishio
>Priority: Minor
>
> When a table contains columns such as BINARY_DOUBLE or BINARY_FLOAT, this 
> JDBC connector throws SQL exception
> {code}
> java.sql.SQLException: Unsupported type 101
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:235)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:292)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:291)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:64)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
> {code}
> these types are Oracle specific ones, described here
> https://docs.oracle.com/cd/E11882_01/timesten.112/e21642/types.htm#TTSQL148



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22327:
-
Description: 
with warning
* checking CRAN incoming feasibility ... WARNING
Maintainer: 'Shivaram Venkataraman '
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
WARNING: There was 1 warning.
NOTE: There were 2 notes.

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1.

The root cause of the issue is in the package version check in CRAN check. 
After the SparkR package version 2.1.2 (is first) published, any older version 
is failing the version check. As far as we know, there is no way to skip this 
version check.

Also, there is previously a NOTE on new maintainer. 

  was:
with warning
* checking CRAN incoming feasibility ... WARNING
Maintainer: 'Shivaram Venkataraman '
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1.

The root cause of the issue is in the package version check in CRAN check. 
After the SparkR package version 2.1.2 (is first) published, any older version 
is failing the version check. As far as we know, there is no way to skip this 
version check.

Also, there is previously a NOTE on new maintainer. 


> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> WARNING: There was 1 warning.
> NOTE: There were 2 notes.
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213798#comment-16213798
 ] 

Felix Cheung edited comment on SPARK-22327 at 10/21/17 7:45 AM:


in contrast, this is from master

* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Shivaram Venkataraman '
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
* checking package dependencies ... NOTE
  No repository set, so cyclic dependency check skipped
* checking R code for possible problems ... NOTE
Found the following calls to attach():
File 'SparkR/R/DataFrame.R':
  attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts)
See section 'Good practice' in '?attach'.
NOTE: There were 3 notes.

So it should have 3 notes, or (2 notes +  1warning) as the one note turns into 
a warning for Insufficient package version


was (Author: felixcheung):
in contrast, this is from master

* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Shivaram Venkataraman '
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
* checking package dependencies ... NOTE
  No repository set, so cyclic dependency check skipped
* checking R code for possible problems ... NOTE
Found the following calls to attach():
File 'SparkR/R/DataFrame.R':
  attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts)
See section 'Good practice' in '?attach'.
NOTE: There were 3 notes.

So it should have 3 notes or (2 notes +  1warning) as the one note turns into a 
warning for Insufficient package version

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213798#comment-16213798
 ] 

Felix Cheung commented on SPARK-22327:
--

in contrast, this is from master

* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Shivaram Venkataraman '
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
* checking package dependencies ... NOTE
  No repository set, so cyclic dependency check skipped
* checking R code for possible problems ... NOTE
Found the following calls to attach():
File 'SparkR/R/DataFrame.R':
  attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts)
See section 'Good practice' in '?attach'.
NOTE: There were 3 notes.

So it should have 3 notes or (2 notes +  1warning) as the one note turns into a 
warning for Insufficient package version

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213798#comment-16213798
 ] 

Felix Cheung edited comment on SPARK-22327 at 10/21/17 7:45 AM:


in contrast, this is from master 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82919/consoleFull

* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Shivaram Venkataraman '
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
* checking package dependencies ... NOTE
  No repository set, so cyclic dependency check skipped
* checking R code for possible problems ... NOTE
Found the following calls to attach():
File 'SparkR/R/DataFrame.R':
  attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts)
See section 'Good practice' in '?attach'.
NOTE: There were 3 notes.

So it should have 3 notes, or (2 notes +  1warning) as the one note turns into 
a warning for Insufficient package version


was (Author: felixcheung):
in contrast, this is from master

* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Shivaram Venkataraman '
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
* checking package dependencies ... NOTE
  No repository set, so cyclic dependency check skipped
* checking R code for possible problems ... NOTE
Found the following calls to attach():
File 'SparkR/R/DataFrame.R':
  attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts)
See section 'Good practice' in '?attach'.
NOTE: There were 3 notes.

So it should have 3 notes, or (2 notes +  1warning) as the one note turns into 
a warning for Insufficient package version

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22327:
-
Description: 
with warning
* checking CRAN incoming feasibility ... WARNING
Maintainer: 'Shivaram Venkataraman '
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1.

The root cause of the issue is in the package version check in CRAN check. 
After the SparkR package version 2.1.2 (is first) published, any older version 
is failing the version check. As far as we know, there is no way to skip this 
version check.

Also, there is previously a NOTE on new maintainer. 

  was:
with warning
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1.

The root cause of the issue is in the package version check in CRAN check. 
After the SparkR package version 2.1.2 (is first) published, any older version 
is failing the version check. As far as we know, there is no way to skip this 
version check.

Also, there is previously a NOTE on new maintainer. 


> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22327:
-
Description: 
with warning
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1.

The root cause of the issue is in the package version check in CRAN check. 
After the SparkR package version 2.1.2 (is first) published, any older version 
is failing the version check. As far as we know, there is no way to skip this 
version check.

Also, there is previously a NOTE on new maintainer. 

  was:
with error
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1


> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213781#comment-16213781
 ] 

Felix Cheung commented on SPARK-22327:
--

https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3956/consoleFull

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with error
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22327:
-
Affects Version/s: 2.3.0

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with error
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22327:
-
Affects Version/s: 2.2.1

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1
>Reporter: Felix Cheung
>
> with error
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-22327:


 Summary: R CRAN check fails on non-latest branches
 Key: SPARK-22327
 URL: https://issues.apache.org/jira/browse/SPARK-22327
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.6.4, 2.0.3, 2.1.3
Reporter: Felix Cheung


with error
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20331) Broaden support for Hive partition pruning predicate pushdown

2017-10-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213775#comment-16213775
 ] 

Apache Spark commented on SPARK-20331:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/19547

> Broaden support for Hive partition pruning predicate pushdown
> -
>
> Key: SPARK-20331
> URL: https://issues.apache.org/jira/browse/SPARK-20331
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Michael Allman
>Assignee: Michael Allman
> Fix For: 2.3.0
>
>
> Spark 2.1 introduced scalable support for Hive tables with huge numbers of 
> partitions. Key to leveraging this support is the ability to prune 
> unnecessary table partitions to answer queries. Spark supports a subset of 
> the class of partition pruning predicates that the Hive metastore supports. 
> If a user writes a query with a partition pruning predicate that is *not* 
> supported by Spark, Spark falls back to loading all partitions and pruning 
> client-side. We want to broaden Spark's current partition pruning predicate 
> pushdown capabilities.
> One of the key missing capabilities is support for disjunctions. For example, 
> for a table partitioned by date, specifying with a predicate like
> {code}date = 20161011 or date = 20161014{code}
> will result in Spark fetching all partitions. For a table partitioned by date 
> and hour, querying a range of hours across dates can be quite difficult to 
> accomplish without fetching all partition metadata.
> The current partition pruning support supports only comparisons against 
> literals. We can expand that to foldable expressions by evaluating them at 
> planning time.
> We can also implement support for the "IN" comparison by expanding it to a 
> sequence of "OR"s.
> This ticket covers those enhancements.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org