[jira] [Commented] (SPARK-24467) VectorAssemblerEstimator

2018-07-03 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532324#comment-16532324
 ] 

Liang-Chi Hsieh commented on SPARK-24467:
-

It sounds good to me for the approach similar to one hot encoder.

> VectorAssemblerEstimator
> 
>
> Key: SPARK-24467
> URL: https://issues.apache.org/jira/browse/SPARK-24467
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> In [SPARK-22346], I believe I made a wrong API decision: I recommended added 
> `VectorSizeHint` instead of making `VectorAssembler` into an Estimator since 
> I thought the latter option would break most workflows.  However, I should 
> have proposed:
> * Add a Param to VectorAssembler for specifying the sizes of Vectors in the 
> inputCols.  This Param can be optional.  If not given, then VectorAssembler 
> will behave as it does now.  If given, then VectorAssembler can use that info 
> instead of figuring out the Vector sizes via metadata or examining Rows in 
> the data (though it could do consistency checks).
> * Add a VectorAssemblerEstimator which gets the Vector lengths from data and 
> produces a VectorAssembler with the vector lengths Param specified.
> This will not break existing workflows.  Migrating to 
> VectorAssemblerEstimator will be easier than adding VectorSizeHint since it 
> will not require users to manually input Vector lengths.
> Note: Even with this Estimator, VectorSizeHint might prove useful for other 
> things in the future which require vector length metadata, so we could 
> consider keeping it rather than deprecating it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24422) Add JDK9+ in our Jenkins' build servers

2018-07-03 Thread vaquar khan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532268#comment-16532268
 ] 

vaquar khan commented on SPARK-24422:
-

I can't see any update on Jenkins and Java 9 support , last update was  
Published on 2017-04-10 by [Baptiste 
Mathus|https://jenkins.io/blog/2017/04/10/jenkins-has-upgraded-to-java-8/#about-the-author]

 
*Java 9 compatibility*
At this point, Jenkins does not yet support Java 9 development releases.

[https://jenkins.io/blog/2017/04/10/jenkins-has-upgraded-to-java-8/]

 

> Add JDK9+ in our Jenkins' build servers
> ---
>
> Key: SPARK-24422
> URL: https://issues.apache.org/jira/browse/SPARK-24422
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24421) sun.misc.Unsafe in JDK9+

2018-07-03 Thread vaquar khan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532262#comment-16532262
 ] 

vaquar khan commented on SPARK-24421:
-

1) I would suggest to use "Java Dependency Analysis Tool" to identify all code 
for dependencies on the rt.jar and tools.jar files as both of them were removed 
in Java 9

[https://wiki.openjdk.java.net/display/JDK8/Java+Dependency+Analysis+Tool]

 

2) Need to run code on Java 7 , 8 and 9 to identify backward-compatibility , 
Java 9 not given full backward-compatibility.

 

> sun.misc.Unsafe in JDK9+
> 
>
> Key: SPARK-24421
> URL: https://issues.apache.org/jira/browse/SPARK-24421
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Priority: Major
>
> Many internal APIs such as unsafe are encapsulated in JDK9+, see 
> http://openjdk.java.net/jeps/260 for detail.
> To use Unsafe, we need to add *jdk.unsupported* to our code’s module 
> declaration:
> {code:java}
> module java9unsafe {
> requires jdk.unsupported;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24681) Cannot create a view from a table when a nested column name contains ':'

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24681:


Assignee: (was: Apache Spark)

> Cannot create a view from a table when a nested column name contains ':'
> 
>
> Key: SPARK-24681
> URL: https://issues.apache.org/jira/browse/SPARK-24681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Adrian Ionescu
>Priority: Major
>
> Here's a patch that reproduces the issue: 
> {code:java}
> diff --git 
> a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> index 09c1547..29bb3db 100644 
> --- 
> a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> +++ 
> b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> @@ -19,6 +19,7 @@ package org.apache.spark.sql.hive 
>  
> import org.apache.spark.sql.{QueryTest, Row} 
> import org.apache.spark.sql.execution.datasources.parquet.ParquetTest 
> +import org.apache.spark.sql.functions.{lit, struct} 
> import org.apache.spark.sql.hive.test.TestHiveSingleton 
>  
> case class Cases(lower: String, UPPER: String) 
> @@ -76,4 +77,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
> with TestHiveSingleton 
>   } 
> } 
>   } 
> + 
> +  test("column names including ':' characters") { 
> +    withTempPath { path => 
> +  withTable("test_table") { 
> +    spark.range(0) 
> +  .select(struct(lit(0).as("nested:column")).as("toplevel:column")) 
> +  .write.format("parquet") 
> +  .option("path", path.getCanonicalPath) 
> +  .saveAsTable("test_table") 
> + 
> +    sql("CREATE VIEW test_view_1 AS SELECT `toplevel:column`.* FROM 
> test_table") 
> +    sql("CREATE VIEW test_view_2 AS SELECT * FROM test_table") 
> + 
> +  } 
> +    } 
> +  } 
> }{code}
> The first "CREATE VIEW" statement succeeds, but the second one fails with:
> {code:java}
> org.apache.spark.SparkException: Cannot recognize hive type string: 
> struct
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24681) Cannot create a view from a table when a nested column name contains ':'

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24681:


Assignee: Apache Spark

> Cannot create a view from a table when a nested column name contains ':'
> 
>
> Key: SPARK-24681
> URL: https://issues.apache.org/jira/browse/SPARK-24681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Adrian Ionescu
>Assignee: Apache Spark
>Priority: Major
>
> Here's a patch that reproduces the issue: 
> {code:java}
> diff --git 
> a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> index 09c1547..29bb3db 100644 
> --- 
> a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> +++ 
> b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> @@ -19,6 +19,7 @@ package org.apache.spark.sql.hive 
>  
> import org.apache.spark.sql.{QueryTest, Row} 
> import org.apache.spark.sql.execution.datasources.parquet.ParquetTest 
> +import org.apache.spark.sql.functions.{lit, struct} 
> import org.apache.spark.sql.hive.test.TestHiveSingleton 
>  
> case class Cases(lower: String, UPPER: String) 
> @@ -76,4 +77,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
> with TestHiveSingleton 
>   } 
> } 
>   } 
> + 
> +  test("column names including ':' characters") { 
> +    withTempPath { path => 
> +  withTable("test_table") { 
> +    spark.range(0) 
> +  .select(struct(lit(0).as("nested:column")).as("toplevel:column")) 
> +  .write.format("parquet") 
> +  .option("path", path.getCanonicalPath) 
> +  .saveAsTable("test_table") 
> + 
> +    sql("CREATE VIEW test_view_1 AS SELECT `toplevel:column`.* FROM 
> test_table") 
> +    sql("CREATE VIEW test_view_2 AS SELECT * FROM test_table") 
> + 
> +  } 
> +    } 
> +  } 
> }{code}
> The first "CREATE VIEW" statement succeeds, but the second one fails with:
> {code:java}
> org.apache.spark.SparkException: Cannot recognize hive type string: 
> struct
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24681) Cannot create a view from a table when a nested column name contains ':'

2018-07-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532234#comment-16532234
 ] 

Apache Spark commented on SPARK-24681:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/21711

> Cannot create a view from a table when a nested column name contains ':'
> 
>
> Key: SPARK-24681
> URL: https://issues.apache.org/jira/browse/SPARK-24681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Adrian Ionescu
>Priority: Major
>
> Here's a patch that reproduces the issue: 
> {code:java}
> diff --git 
> a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> index 09c1547..29bb3db 100644 
> --- 
> a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> +++ 
> b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> @@ -19,6 +19,7 @@ package org.apache.spark.sql.hive 
>  
> import org.apache.spark.sql.{QueryTest, Row} 
> import org.apache.spark.sql.execution.datasources.parquet.ParquetTest 
> +import org.apache.spark.sql.functions.{lit, struct} 
> import org.apache.spark.sql.hive.test.TestHiveSingleton 
>  
> case class Cases(lower: String, UPPER: String) 
> @@ -76,4 +77,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
> with TestHiveSingleton 
>   } 
> } 
>   } 
> + 
> +  test("column names including ':' characters") { 
> +    withTempPath { path => 
> +  withTable("test_table") { 
> +    spark.range(0) 
> +  .select(struct(lit(0).as("nested:column")).as("toplevel:column")) 
> +  .write.format("parquet") 
> +  .option("path", path.getCanonicalPath) 
> +  .saveAsTable("test_table") 
> + 
> +    sql("CREATE VIEW test_view_1 AS SELECT `toplevel:column`.* FROM 
> test_table") 
> +    sql("CREATE VIEW test_view_2 AS SELECT * FROM test_table") 
> + 
> +  } 
> +    } 
> +  } 
> }{code}
> The first "CREATE VIEW" statement succeeds, but the second one fails with:
> {code:java}
> org.apache.spark.SparkException: Cannot recognize hive type string: 
> struct
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24732) Type coercion between MapTypes.

2018-07-03 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-24732.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21703
[https://github.com/apache/spark/pull/21703]

> Type coercion between MapTypes.
> ---
>
> Key: SPARK-24732
> URL: https://issues.apache.org/jira/browse/SPARK-24732
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 2.4.0
>
>
> It seems currently we don't allow type coercion between maps.
> We can support type coercion between MapTypes where both the key types and 
> the value types are compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24732) Type coercion between MapTypes.

2018-07-03 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-24732:


Assignee: Takuya Ueshin

> Type coercion between MapTypes.
> ---
>
> Key: SPARK-24732
> URL: https://issues.apache.org/jira/browse/SPARK-24732
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>
> It seems currently we don't allow type coercion between maps.
> We can support type coercion between MapTypes where both the key types and 
> the value types are compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24207) PrefixSpan: R API

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24207:


Assignee: (was: Apache Spark)

> PrefixSpan: R API
> -
>
> Key: SPARK-24207
> URL: https://issues.apache.org/jira/browse/SPARK-24207
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24207) PrefixSpan: R API

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24207:


Assignee: Apache Spark

> PrefixSpan: R API
> -
>
> Key: SPARK-24207
> URL: https://issues.apache.org/jira/browse/SPARK-24207
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24207) PrefixSpan: R API

2018-07-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532196#comment-16532196
 ] 

Apache Spark commented on SPARK-24207:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/21710

> PrefixSpan: R API
> -
>
> Key: SPARK-24207
> URL: https://issues.apache.org/jira/browse/SPARK-24207
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

2018-07-03 Thread Hichame El Khalfi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532178#comment-16532178
 ] 

Hichame El Khalfi commented on SPARK-24644:
---

Hello [~hyukjin.kwon]

Thanks for taking time on this ticket

Regardig the environemnt, we are using:
 * CentOS 7
 * JDK 1.8.0_101-b13
 * CPython interpreter 2.7
 * Spark 2.3.1 in distributed mode.
 * pandas 0.13.0
 * pyarrow 0.9.0

 

> Pyarrow exception while running pandas_udf on pyspark 2.3.1
> ---
>
> Key: SPARK-24644
> URL: https://issues.apache.org/jira/browse/SPARK-24644
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 2.3.1
> Environment: os: centos
> pyspark 2.3.1
> spark 2.3.1
> pyarrow >= 0.8.0
>Reporter: Hichame El Khalfi
>Priority: Major
>
> Hello,
> When I try to run a `pandas_udf` on my spark dataframe, I get this error
>  
> {code:java}
>   File 
> "/mnt/ephemeral3/yarn/nm/usercache/user/appcache/application_1524574803975_205774/container_e280_1524574803975_205774_01_44/pyspark.zip/pyspark/serializers.py",
>  lin
> e 280, in load_stream
> pdf = batch.to_pandas()
>   File "pyarrow/table.pxi", line 677, in pyarrow.lib.RecordBatch.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:43226)
> return Table.from_batches([self]).to_pandas(nthreads=nthreads)
>   File "pyarrow/table.pxi", line 1043, in pyarrow.lib.Table.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:46331)
> mgr = pdcompat.table_to_blockmanager(options, self, memory_pool,
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 528, in table_to_blockmanager
> blocks = _table_to_blocks(options, block_table, nthreads, memory_pool)
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 622, in _table_to_blocks
> return [_reconstruct_block(item) for item in result]
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 446, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
> TypeError: make_block() takes at least 3 arguments (2 given)
> {code}
>  
>  More than happy to provide any additional information



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24530) Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken

2018-07-03 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532160#comment-16532160
 ] 

Hyukjin Kwon commented on SPARK-24530:
--

[~mengxr], I lowered the priority to {{Critical}} for now since I believe this 
doesn't block the release although it's critical. Please revert my action if 
you think differently. I don't mind.

> Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) 
> and pyspark.ml docs are broken
> ---
>
> Key: SPARK-24530
> URL: https://issues.apache.org/jira/browse/SPARK-24530
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.4.0
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Critical
> Attachments: Screen Shot 2018-06-12 at 8.23.18 AM.png, Screen Shot 
> 2018-06-12 at 8.23.29 AM.png, image-2018-06-13-15-15-51-025.png, 
> pyspark-ml-doc-utuntu18.04-python2.7-sphinx-1.7.5.png
>
>
> I generated python docs from master locally using `make html`. However, the 
> generated html doc doesn't render class docs correctly. I attached the 
> screenshot from Spark 2.3 docs and master docs generated on my local. Not 
> sure if this is because my local setup.
> cc: [~dongjoon] Could you help verify?
>  
> The followings are our released doc status. Some recent docs seems to be 
> broken.
> *2.1.x*
> (O) 
> [https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (O) 
> [https://spark.apache.org/docs/2.1.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.1.2/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.2.x*
> (O) 
> [https://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.2.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.3.x*
> (O) 
> [https://spark.apache.org/docs/2.3.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24530) Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken

2018-07-03 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-24530:
-
Priority: Critical  (was: Blocker)

> Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) 
> and pyspark.ml docs are broken
> ---
>
> Key: SPARK-24530
> URL: https://issues.apache.org/jira/browse/SPARK-24530
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.4.0
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Critical
> Attachments: Screen Shot 2018-06-12 at 8.23.18 AM.png, Screen Shot 
> 2018-06-12 at 8.23.29 AM.png, image-2018-06-13-15-15-51-025.png, 
> pyspark-ml-doc-utuntu18.04-python2.7-sphinx-1.7.5.png
>
>
> I generated python docs from master locally using `make html`. However, the 
> generated html doc doesn't render class docs correctly. I attached the 
> screenshot from Spark 2.3 docs and master docs generated on my local. Not 
> sure if this is because my local setup.
> cc: [~dongjoon] Could you help verify?
>  
> The followings are our released doc status. Some recent docs seems to be 
> broken.
> *2.1.x*
> (O) 
> [https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (O) 
> [https://spark.apache.org/docs/2.1.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.1.2/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.2.x*
> (O) 
> [https://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.2.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.3.x*
> (O) 
> [https://spark.apache.org/docs/2.3.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24736) --py-files appears to pass non-local URL's into PYTHONPATH directly.

2018-07-03 Thread Jonathan A Weaver (JIRA)
Jonathan A Weaver created SPARK-24736:
-

 Summary: --py-files appears to pass non-local URL's into 
PYTHONPATH directly.
 Key: SPARK-24736
 URL: https://issues.apache.org/jira/browse/SPARK-24736
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, PySpark
Affects Versions: 2.4.0
 Environment: Recent 2.4.0 from master branch, submitted on Linux to a 
KOPS Kubernetes cluster created on AWS.

 
Reporter: Jonathan A Weaver


My spark-submit
bin/spark-submit \
        --master 
k8s://[https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com|https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com/]
 \
        --deploy-mode cluster \
        --name pytest \
        --conf 
spark.kubernetes.container.image=[412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest|http://412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest]
 \
        --conf 
[spark.kubernetes.driver.pod.name|http://spark.kubernetes.driver.pod.name/]=spark-pi-driver
 \
        --conf 
spark.kubernetes.authenticate.submission.caCertFile=[cluster.ca|http://cluster.ca/]
 \
        --conf spark.kubernetes.authenticate.submission.oauthToken=$TOK \
        --conf spark.kubernetes.authenticate.driver.oauthToken=$TOK \
--py-files "[https://s3.amazonaws.com/maxar-ids-fids/screw.zip]"; \
[https://s3.amazonaws.com/maxar-ids-fids/it.py]
 
*screw.zip is successfully downloaded and placed in SparkFIles.getRootPath()*
2018-07-01 07:33:43 INFO  SparkContext:54 - Added file 
[https://s3.amazonaws.com/maxar-ids-fids/screw.zip] at 
[https://s3.amazonaws.com/maxar-ids-fids/screw.zip] with timestamp 1530430423297
2018-07-01 07:33:43 INFO  Utils:54 - Fetching 
[https://s3.amazonaws.com/maxar-ids-fids/screw.zip] to 
/var/data/spark-7aba748d-2bba-4015-b388-c2ba9adba81e/spark-0ed5a100-6efa-45ca-ad4c-d1e57af76ffd/userFiles-a053206e-33d9-4245-b587-f8ac26d4c240/fetchFileTemp1549645948768432992.tmp
*I print out the  PYTHONPATH and PYSPARK_FILES environment variables from the 
driver script:*
     PYTHONPATH 
/opt/spark/python/lib/pyspark.zip:/opt/spark/python/lib/py4j-0.10.7-src.zip:/opt/spark/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/opt/spark/python/lib/pyspark.zip:/opt/spark/python/lib/py4j-*.zip:*[https://s3.amazonaws.com/maxar-ids-fids/screw.zip]*
    PYSPARK_FILES [https://s3.amazonaws.com/maxar-ids-fids/screw.zip]
 
*I print out sys.path*
['/tmp/spark-fec3684b-8b63-4f43-91a4-2f2fa41a1914', 
u'/var/data/spark-7aba748d-2bba-4015-b388-c2ba9adba81e/spark-0ed5a100-6efa-45ca-ad4c-d1e57af76ffd/userFiles-a053206e-33d9-4245-b587-f8ac26d4c240',
 '/opt/spark/python/lib/pyspark.zip', 
'/opt/spark/python/lib/py4j-0.10.7-src.zip', 
'/opt/spark/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar', 
'/opt/spark/python/lib/py4j-*.zip', *'/opt/spark/work-dir/https', 
'//[s3.amazonaws.com/maxar-ids-fids/screw.zip|http://s3.amazonaws.com/maxar-ids-fids/screw.zip]',*
 '/usr/lib/python27.zip', '/usr/lib/python2.7', 
'/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', 
'/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', 
'/usr/lib/python2.7/site-packages']
 
*URL from PYTHONFILES gets placed in sys.path verbatim with obvious results.*
 
*Dump of spark config from container.*
Spark config dumped:
[(u'spark.master', 
u'k8s://[https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com|https://internal-api-test-k8s-local-7afed8-796273878.us-east-1.elb.amazonaws.com/]'),
 (u'spark.kubernetes.authenticate.submission.oauthToken', 
u''), 
(u'spark.kubernetes.authenticate.driver.oauthToken', 
u''), (u'spark.kubernetes.executor.podNamePrefix', 
u'pytest-1530430411996'), (u'spark.kubernetes.memoryOverheadFactor', u'0.4'), 
(u'spark.driver.blockManager.port', u'7079'), 
(u'[spark.app.id|http://spark.app.id/]', u'spark-application-1530430424433'), 
(u'[spark.app.name|http://spark.app.name/]', u'pytest'), 
(u'[spark.executor.id|http://spark.executor.id/]', u'driver'), 
(u'spark.driver.host', u'pytest-1530430411996-driver-svc.default.svc'), 
(u'spark.kubernetes.container.image', 
u'[412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest'|http://412834075398.dkr.ecr.us-east-1.amazonaws.com/fids/pyspark-k8s:latest']),
 (u'spark.driver.port', u'7078'), (u'spark.kubernetes.python.mainAppResource', 
u'[https://s3.amazonaws.com/maxar-ids-fids/it.py']), 
(u'spark.kubernetes.authenticate.submission.caCertFile', 
u'[cluster.ca|http://cluster.ca/]'), (u'spark.rdd.compress', u'True'), 
(u'spark.driver.bindAddress', u'100.120.0.1'), 
(u'[spark.kubernetes.driver.pod.name|http://spark.kubernetes.driver.pod.name/]',
 u'spark-pi-driver'), (u'spark.serializer.objectStreamReset', u'100'), 
(u'spark.files', 
u'[https://s3.amazonaws.com/maxar-ids-fids/it.py,https://s3.amazonaws.com/maxar-ids-fids/screw.zip']),
 (u'spark.kubernetes.python.pyFiles', 
u'[https://s3.am

[jira] [Commented] (SPARK-24530) Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken

2018-07-03 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532159#comment-16532159
 ] 

Hyukjin Kwon commented on SPARK-24530:
--

There's workaround for this. To cut it short, Sphinx for Python 3 is required 
and installed e.g., {{sudo pip3 install sphinx}} but Sphinx for Python 2 should 
be removed first before installing Sphinx for Python 3 for sure if Sphinx for 
Python 2 is already installed.

Currently, whether it uses Sphinx for Pythom 3 or not can be manually checked 
by, {{cd python/docs && make clean html}}, checking if the keywords arguments 
are shown in the equaivelent link above before making the release documentation 
for Python API.

> Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) 
> and pyspark.ml docs are broken
> ---
>
> Key: SPARK-24530
> URL: https://issues.apache.org/jira/browse/SPARK-24530
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.4.0
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Blocker
> Attachments: Screen Shot 2018-06-12 at 8.23.18 AM.png, Screen Shot 
> 2018-06-12 at 8.23.29 AM.png, image-2018-06-13-15-15-51-025.png, 
> pyspark-ml-doc-utuntu18.04-python2.7-sphinx-1.7.5.png
>
>
> I generated python docs from master locally using `make html`. However, the 
> generated html doc doesn't render class docs correctly. I attached the 
> screenshot from Spark 2.3 docs and master docs generated on my local. Not 
> sure if this is because my local setup.
> cc: [~dongjoon] Could you help verify?
>  
> The followings are our released doc status. Some recent docs seems to be 
> broken.
> *2.1.x*
> (O) 
> [https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (O) 
> [https://spark.apache.org/docs/2.1.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.1.2/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.2.x*
> (O) 
> [https://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.2.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.3.x*
> (O) 
> [https://spark.apache.org/docs/2.3.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24535) Fix java version parsing in SparkR on Windows

2018-07-03 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532145#comment-16532145
 ] 

Saisai Shao commented on SPARK-24535:
-

Hi [~felixcheung] what's the current status of this JIRA, do you have an ETA 
about it?

> Fix java version parsing in SparkR on Windows
> -
>
> Key: SPARK-24535
> URL: https://issues.apache.org/jira/browse/SPARK-24535
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Shivaram Venkataraman
>Assignee: Felix Cheung
>Priority: Blocker
>
> We see errors on CRAN of the form 
> {code:java}
>   java version "1.8.0_144"
>   Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>   Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>   Picked up _JAVA_OPTIONS: -XX:-UsePerfData 
>   -- 1. Error: create DataFrame from list or data.frame (@test_basic.R#21)  
> --
>   subscript out of bounds
>   1: sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
> sparkConfig = sparkRTestConfig) at 
> D:/temp/RtmpIJ8Cc3/RLIBS_3242c713c3181/SparkR/tests/testthat/test_basic.R:21
>   2: sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, 
> sparkExecutorEnvMap, 
>  sparkJars, sparkPackages)
>   3: checkJavaVersion()
>   4: strsplit(javaVersionFilter[[1]], "[\"]")
> {code}
> The complete log file is at 
> http://home.apache.org/~shivaram/SparkR_2.3.1_check_results/Windows/00check.log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24530) Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken

2018-07-03 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532143#comment-16532143
 ] 

Saisai Shao commented on SPARK-24530:
-

Hi [~hyukjin.kwon] what is the current status of this JIRA, do you have an ETA 
about it?

> Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) 
> and pyspark.ml docs are broken
> ---
>
> Key: SPARK-24530
> URL: https://issues.apache.org/jira/browse/SPARK-24530
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.4.0
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Blocker
> Attachments: Screen Shot 2018-06-12 at 8.23.18 AM.png, Screen Shot 
> 2018-06-12 at 8.23.29 AM.png, image-2018-06-13-15-15-51-025.png, 
> pyspark-ml-doc-utuntu18.04-python2.7-sphinx-1.7.5.png
>
>
> I generated python docs from master locally using `make html`. However, the 
> generated html doc doesn't render class docs correctly. I attached the 
> screenshot from Spark 2.3 docs and master docs generated on my local. Not 
> sure if this is because my local setup.
> cc: [~dongjoon] Could you help verify?
>  
> The followings are our released doc status. Some recent docs seems to be 
> broken.
> *2.1.x*
> (O) 
> [https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (O) 
> [https://spark.apache.org/docs/2.1.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.1.2/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.2.x*
> (O) 
> [https://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.2.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.3.x*
> (O) 
> [https://spark.apache.org/docs/2.3.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23698) Spark code contains numerous undefined names in Python 3

2018-07-03 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-23698:


Assignee: cclauss

> Spark code contains numerous undefined names in Python 3
> 
>
> Key: SPARK-23698
> URL: https://issues.apache.org/jira/browse/SPARK-23698
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: cclauss
>Assignee: cclauss
>Priority: Minor
> Fix For: 2.4.0
>
>
> flake8 testing of https://github.com/apache/spark on Python 3.6.3
> $ *flake8 . --count --select=E901,E999,F821,F822,F823 --show-source 
> --statistics*
> ./dev/merge_spark_pr.py:98:14: F821 undefined name 'raw_input'
> result = raw_input("\n%s (y/n): " % prompt)
>  ^
> ./dev/merge_spark_pr.py:136:22: F821 undefined name 'raw_input'
> primary_author = raw_input(
>  ^
> ./dev/merge_spark_pr.py:186:16: F821 undefined name 'raw_input'
> pick_ref = raw_input("Enter a branch name [%s]: " % default_branch)
>^
> ./dev/merge_spark_pr.py:233:15: F821 undefined name 'raw_input'
> jira_id = raw_input("Enter a JIRA id [%s]: " % default_jira_id)
>   ^
> ./dev/merge_spark_pr.py:278:20: F821 undefined name 'raw_input'
> fix_versions = raw_input("Enter comma-separated fix version(s) [%s]: " % 
> default_fix_versions)
>^
> ./dev/merge_spark_pr.py:317:28: F821 undefined name 'raw_input'
> raw_assignee = raw_input(
>^
> ./dev/merge_spark_pr.py:430:14: F821 undefined name 'raw_input'
> pr_num = raw_input("Which pull request would you like to merge? (e.g. 
> 34): ")
>  ^
> ./dev/merge_spark_pr.py:442:18: F821 undefined name 'raw_input'
> result = raw_input("Would you like to use the modified title? (y/n): 
> ")
>  ^
> ./dev/merge_spark_pr.py:493:11: F821 undefined name 'raw_input'
> while raw_input("\n%s (y/n): " % pick_prompt).lower() == "y":
>   ^
> ./dev/create-release/releaseutils.py:58:16: F821 undefined name 'raw_input'
> response = raw_input("%s [y/n]: " % msg)
>^
> ./dev/create-release/releaseutils.py:152:38: F821 undefined name 'unicode'
> author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
>  ^
> ./python/setup.py:37:11: F821 undefined name '__version__'
> VERSION = __version__
>   ^
> ./python/pyspark/cloudpickle.py:275:18: F821 undefined name 'buffer'
> dispatch[buffer] = save_buffer
>  ^
> ./python/pyspark/cloudpickle.py:807:18: F821 undefined name 'file'
> dispatch[file] = save_file
>  ^
> ./python/pyspark/sql/conf.py:61:61: F821 undefined name 'unicode'
> if not isinstance(obj, str) and not isinstance(obj, unicode):
> ^
> ./python/pyspark/sql/streaming.py:25:21: F821 undefined name 'long'
> intlike = (int, long)
> ^
> ./python/pyspark/streaming/dstream.py:405:35: F821 undefined name 'long'
> return self._sc._jvm.Time(long(timestamp * 1000))
>   ^
> ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:21:10: F821 
> undefined name 'xrange'
> for i in xrange(50):
>  ^
> ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:22:14: F821 
> undefined name 'xrange'
> for j in xrange(5):
>  ^
> ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:23:18: F821 
> undefined name 'xrange'
> for k in xrange(20022):
>  ^
> 20F821 undefined name 'raw_input'
> 20



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23698) Spark code contains numerous undefined names in Python 3

2018-07-03 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-23698.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21702
[https://github.com/apache/spark/pull/21702]

> Spark code contains numerous undefined names in Python 3
> 
>
> Key: SPARK-23698
> URL: https://issues.apache.org/jira/browse/SPARK-23698
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: cclauss
>Assignee: cclauss
>Priority: Minor
> Fix For: 2.4.0
>
>
> flake8 testing of https://github.com/apache/spark on Python 3.6.3
> $ *flake8 . --count --select=E901,E999,F821,F822,F823 --show-source 
> --statistics*
> ./dev/merge_spark_pr.py:98:14: F821 undefined name 'raw_input'
> result = raw_input("\n%s (y/n): " % prompt)
>  ^
> ./dev/merge_spark_pr.py:136:22: F821 undefined name 'raw_input'
> primary_author = raw_input(
>  ^
> ./dev/merge_spark_pr.py:186:16: F821 undefined name 'raw_input'
> pick_ref = raw_input("Enter a branch name [%s]: " % default_branch)
>^
> ./dev/merge_spark_pr.py:233:15: F821 undefined name 'raw_input'
> jira_id = raw_input("Enter a JIRA id [%s]: " % default_jira_id)
>   ^
> ./dev/merge_spark_pr.py:278:20: F821 undefined name 'raw_input'
> fix_versions = raw_input("Enter comma-separated fix version(s) [%s]: " % 
> default_fix_versions)
>^
> ./dev/merge_spark_pr.py:317:28: F821 undefined name 'raw_input'
> raw_assignee = raw_input(
>^
> ./dev/merge_spark_pr.py:430:14: F821 undefined name 'raw_input'
> pr_num = raw_input("Which pull request would you like to merge? (e.g. 
> 34): ")
>  ^
> ./dev/merge_spark_pr.py:442:18: F821 undefined name 'raw_input'
> result = raw_input("Would you like to use the modified title? (y/n): 
> ")
>  ^
> ./dev/merge_spark_pr.py:493:11: F821 undefined name 'raw_input'
> while raw_input("\n%s (y/n): " % pick_prompt).lower() == "y":
>   ^
> ./dev/create-release/releaseutils.py:58:16: F821 undefined name 'raw_input'
> response = raw_input("%s [y/n]: " % msg)
>^
> ./dev/create-release/releaseutils.py:152:38: F821 undefined name 'unicode'
> author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
>  ^
> ./python/setup.py:37:11: F821 undefined name '__version__'
> VERSION = __version__
>   ^
> ./python/pyspark/cloudpickle.py:275:18: F821 undefined name 'buffer'
> dispatch[buffer] = save_buffer
>  ^
> ./python/pyspark/cloudpickle.py:807:18: F821 undefined name 'file'
> dispatch[file] = save_file
>  ^
> ./python/pyspark/sql/conf.py:61:61: F821 undefined name 'unicode'
> if not isinstance(obj, str) and not isinstance(obj, unicode):
> ^
> ./python/pyspark/sql/streaming.py:25:21: F821 undefined name 'long'
> intlike = (int, long)
> ^
> ./python/pyspark/streaming/dstream.py:405:35: F821 undefined name 'long'
> return self._sc._jvm.Time(long(timestamp * 1000))
>   ^
> ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:21:10: F821 
> undefined name 'xrange'
> for i in xrange(50):
>  ^
> ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:22:14: F821 
> undefined name 'xrange'
> for j in xrange(5):
>  ^
> ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:23:18: F821 
> undefined name 'xrange'
> for k in xrange(20022):
>  ^
> 20F821 undefined name 'raw_input'
> 20



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24709) Inferring schema from JSON string literal

2018-07-03 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-24709.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21686
[https://github.com/apache/spark/pull/21686]

> Inferring schema from JSON string literal
> -
>
> Key: SPARK-24709
> URL: https://issues.apache.org/jira/browse/SPARK-24709
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 2.4.0
>
>
> Need to add new function - *schema_of_json()*. The function should infer 
> schema of JSON string literal. The result of the function is a schema in DDL 
> format.
> One of the use cases is passing output of _schema_of_json()_ to 
> *from_json()*. Currently, the _from_json()_ function requires a schema as a 
> mandatory argument. An user has to pass a schema as string literal in SQL. 
> The new function should allow schema inferring from an example. Let's say 
> json_col is a column containing JSON string with the same schema. It should 
> be possible to pass a JSON string with the same schema to _schema_of_json()_ 
> which infers schema for the particular example.
> {code:sql}
> select from_json(json_col, schema_of_json('{"f1": 0, "f2": [0], "f2": "a"}'))
> from json_table;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24709) Inferring schema from JSON string literal

2018-07-03 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-24709:


Assignee: Maxim Gekk

> Inferring schema from JSON string literal
> -
>
> Key: SPARK-24709
> URL: https://issues.apache.org/jira/browse/SPARK-24709
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 2.4.0
>
>
> Need to add new function - *schema_of_json()*. The function should infer 
> schema of JSON string literal. The result of the function is a schema in DDL 
> format.
> One of the use cases is passing output of _schema_of_json()_ to 
> *from_json()*. Currently, the _from_json()_ function requires a schema as a 
> mandatory argument. An user has to pass a schema as string literal in SQL. 
> The new function should allow schema inferring from an example. Let's say 
> json_col is a column containing JSON string with the same schema. It should 
> be possible to pass a JSON string with the same schema to _schema_of_json()_ 
> which infers schema for the particular example.
> {code:sql}
> select from_json(json_col, schema_of_json('{"f1": 0, "f2": [0], "f2": "a"}'))
> from json_table;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-5152) Let metrics.properties file take an hdfs:// path

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-5152:
---

Assignee: Apache Spark

> Let metrics.properties file take an hdfs:// path
> 
>
> Key: SPARK-5152
> URL: https://issues.apache.org/jira/browse/SPARK-5152
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Ryan Williams
>Assignee: Apache Spark
>Priority: Major
>
> From my reading of [the 
> code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53],
>  the {{spark.metrics.conf}} property must be a path that is resolvable on the 
> local filesystem of each executor.
> Running a Spark job with {{--conf 
> spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs 
> many errors (~1 per executor, presumably?) like:
> {code}
> 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file
> java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties 
> (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.(FileInputStream.java:146)
> at java.io.FileInputStream.(FileInputStream.java:101)
> at 
> org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53)
> at 
> org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:92)
> at 
> org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329)
> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> {code}
> which seems consistent with the idea that it's looking on the local 
> filesystem and not parsing the "scheme" portion of the URL.
> Letting all executors get their {{metrics.properties}} files from one 
> location on HDFS would be an improvement, right?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-5152) Let metrics.properties file take an hdfs:// path

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-5152:
---

Assignee: (was: Apache Spark)

> Let metrics.properties file take an hdfs:// path
> 
>
> Key: SPARK-5152
> URL: https://issues.apache.org/jira/browse/SPARK-5152
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Ryan Williams
>Priority: Major
>
> From my reading of [the 
> code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53],
>  the {{spark.metrics.conf}} property must be a path that is resolvable on the 
> local filesystem of each executor.
> Running a Spark job with {{--conf 
> spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs 
> many errors (~1 per executor, presumably?) like:
> {code}
> 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file
> java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties 
> (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.(FileInputStream.java:146)
> at java.io.FileInputStream.(FileInputStream.java:101)
> at 
> org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53)
> at 
> org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:92)
> at 
> org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329)
> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> {code}
> which seems consistent with the idea that it's looking on the local 
> filesystem and not parsing the "scheme" portion of the URL.
> Letting all executors get their {{metrics.properties}} files from one 
> location on HDFS would be an improvement, right?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5152) Let metrics.properties file take an hdfs:// path

2018-07-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532102#comment-16532102
 ] 

Apache Spark commented on SPARK-5152:
-

User 'jzhuge' has created a pull request for this issue:
https://github.com/apache/spark/pull/21709

> Let metrics.properties file take an hdfs:// path
> 
>
> Key: SPARK-5152
> URL: https://issues.apache.org/jira/browse/SPARK-5152
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Ryan Williams
>Priority: Major
>
> From my reading of [the 
> code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53],
>  the {{spark.metrics.conf}} property must be a path that is resolvable on the 
> local filesystem of each executor.
> Running a Spark job with {{--conf 
> spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs 
> many errors (~1 per executor, presumably?) like:
> {code}
> 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file
> java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties 
> (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.(FileInputStream.java:146)
> at java.io.FileInputStream.(FileInputStream.java:101)
> at 
> org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53)
> at 
> org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:92)
> at 
> org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329)
> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> {code}
> which seems consistent with the idea that it's looking on the local 
> filesystem and not parsing the "scheme" portion of the URL.
> Letting all executors get their {{metrics.properties}} files from one 
> location on HDFS would be an improvement, right?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9883) Distance to each cluster given a point (KMeansModel)

2018-07-03 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9883.
--
Resolution: Won't Fix

> Distance to each cluster given a point (KMeansModel)
> 
>
> Key: SPARK-9883
> URL: https://issues.apache.org/jira/browse/SPARK-9883
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Bertrand Dechoux
>Priority: Minor
>
> Right now KMeansModel provides only a 'predict 'method which returns the 
> index of the closest cluster.
> https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html#predict(org.apache.spark.mllib.linalg.Vector)
> It would be nice to have a method giving the distance to all clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7856) Scalable PCA implementation for tall and fat matrices

2018-07-03 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-7856.
--
Resolution: Won't Fix

See PR

> Scalable PCA implementation for tall and fat matrices
> -
>
> Key: SPARK-7856
> URL: https://issues.apache.org/jira/browse/SPARK-7856
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Tarek Elgamal
>Priority: Major
>
> Currently the PCA implementation has a limitation of fitting d^2 
> covariance/grammian matrix entries in memory (d is the number of 
> columns/dimensions of the matrix). We often need only the largest k principal 
> components. To make pca really scalable, I suggest an implementation where 
> the memory usage is proportional to the principal components k rather than 
> the full dimensionality d. 
> I suggest adopting the solution described in this paper that is published in 
> SIGMOD 2015 (http://ds.qcri.org/images/profile/tarek_elgamal/sigmod2015.pdf). 
> The paper offers an implementation for Probabilistic PCA (PPCA) which has 
> less memory and time complexity and could potentially scale to tall and fat 
> matrices rather than tall and skinny matrices that is supported by the 
> current PCA impelmentation. 
> Probablistic PCA could be potentially added to the set of algorithms 
> supported by MLlib and it does not necessarily replace the old PCA 
> implementation.
> PPCA implementation is adopted in Matlab's Statistics and Machine Learning 
> Toolbox (http://www.mathworks.com/help/stats/ppca.html)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24735) Improve exception when mixing pandas_udf types

2018-07-03 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-24735:


 Summary: Improve exception when mixing pandas_udf types
 Key: SPARK-24735
 URL: https://issues.apache.org/jira/browse/SPARK-24735
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 2.3.0
Reporter: Bryan Cutler


>From the discussion here 
>https://github.com/apache/spark/pull/21650#discussion_r199203674, mixing up 
>Pandas UDF types, like using GROUPED_MAP as a SCALAR {{foo = pandas_udf(lambda 
>x: x, 'v int', PandasUDFType.GROUPED_MAP)}} produces an exception which is 
>hard to understand.  It should tell the user that the UDF type is wrong.  This 
>is the full output:

{code}
>>> foo = pandas_udf(lambda x: x, 'v int', PandasUDFType.GROUPED_MAP)
>>> df.select(foo(df['v'])).show()
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/icexelloss/workspace/upstream/spark/python/pyspark/sql/dataframe.py", 
line 353, in show
print(self._jdf.showString(n, 20, vertical))
  File 
"/Users/icexelloss/workspace/upstream/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
 line 1257, in __call__
  File 
"/Users/icexelloss/workspace/upstream/spark/python/pyspark/sql/utils.py", line 
63, in deco
return f(*a, **kw)
  File 
"/Users/icexelloss/workspace/upstream/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py",
 line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o257.showString.
: java.lang.UnsupportedOperationException: Cannot evaluate expression: 
(input[0, bigint, false])
at 
org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:261)
at 
org.apache.spark.sql.catalyst.expressions.PythonUDF.doGenCode(PythonUDF.scala:50)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:108)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:105)
at scala.Option.getOrElse(Option.scala:121)
...
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24579) SPIP: Standardize Optimized Data Exchange between Spark and DL/AI frameworks

2018-07-03 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531737#comment-16531737
 ] 

Xiangrui Meng commented on SPARK-24579:
---

[~sethah] [~kiszk] [~rxin] Please request comment permissions on the doc. I 
didn't give everyone comment permissions by default to avoid spam. If I 
addressed a comment, it will be reflected in the current version of the doc.

You can also post comments here.

> SPIP: Standardize Optimized Data Exchange between Spark and DL/AI frameworks
> 
>
> Key: SPARK-24579
> URL: https://issues.apache.org/jira/browse/SPARK-24579
> Project: Spark
>  Issue Type: Epic
>  Components: ML, PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Major
>  Labels: Hydrogen
> Attachments: [SPARK-24579] SPIP_ Standardize Optimized Data Exchange 
> between Apache Spark and DL%2FAI Frameworks .pdf
>
>
> (see attached SPIP pdf for more details)
> At the crossroads of big data and AI, we see both the success of Apache Spark 
> as a unified
> analytics engine and the rise of AI frameworks like TensorFlow and Apache 
> MXNet (incubating).
> Both big data and AI are indispensable components to drive business 
> innovation and there have
> been multiple attempts from both communities to bring them together.
> We saw efforts from AI community to implement data solutions for AI 
> frameworks like tf.data and tf.Transform. However, with 50+ data sources and 
> built-in SQL, DataFrames, and Streaming features, Spark remains the community 
> choice for big data. This is why we saw many efforts to integrate DL/AI 
> frameworks with Spark to leverage its power, for example, TFRecords data 
> source for Spark, TensorFlowOnSpark, TensorFrames, etc. As part of Project 
> Hydrogen, this SPIP takes a different angle at Spark + AI unification.
> None of the integrations are possible without exchanging data between Spark 
> and external DL/AI frameworks. And the performance matters. However, there 
> doesn’t exist a standard way to exchange data and hence implementation and 
> performance optimization fall into pieces. For example, TensorFlowOnSpark 
> uses Hadoop InputFormat/OutputFormat for TensorFlow’s TFRecords to load and 
> save data and pass the RDD records to TensorFlow in Python. And TensorFrames 
> converts Spark DataFrames Rows to/from TensorFlow Tensors using TensorFlow’s 
> Java API. How can we reduce the complexity?
> The proposal here is to standardize the data exchange interface (or format) 
> between Spark and DL/AI frameworks and optimize data conversion from/to this 
> interface.  So DL/AI frameworks can leverage Spark to load data virtually 
> from anywhere without spending extra effort building complex data solutions, 
> like reading features from a production data warehouse or streaming model 
> inference. Spark users can use DL/AI frameworks without learning specific 
> data APIs implemented there. And developers from both sides can work on 
> performance optimizations independently given the interface itself doesn’t 
> introduce big overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20202) Remove references to org.spark-project.hive

2018-07-03 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531706#comment-16531706
 ] 

Hyukjin Kwon commented on SPARK-20202:
--

kindly ping [~owen.omalley] and [~rxin]. I would like to make a progress 
further on this since it's blocked for a while but it's pretty important to 
make up for this affair. 

> Remove references to org.spark-project.hive
> ---
>
> Key: SPARK-20202
> URL: https://issues.apache.org/jira/browse/SPARK-20202
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 1.6.4, 2.0.3, 2.1.1
>Reporter: Owen O'Malley
>Priority: Major
>
> Spark can't continue to depend on their fork of Hive and must move to 
> standard Hive versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24421) sun.misc.Unsafe in JDK9+

2018-07-03 Thread DB Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531705#comment-16531705
 ] 

DB Tsai commented on SPARK-24421:
-

In JDK9+, `sun.misc.Unsafe` is private. We can either access it through 
refection or add java flag to make it accessible.

> sun.misc.Unsafe in JDK9+
> 
>
> Key: SPARK-24421
> URL: https://issues.apache.org/jira/browse/SPARK-24421
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Priority: Major
>
> Many internal APIs such as unsafe are encapsulated in JDK9+, see 
> http://openjdk.java.net/jeps/260 for detail.
> To use Unsafe, we need to add *jdk.unsupported* to our code’s module 
> declaration:
> {code:java}
> module java9unsafe {
> requires jdk.unsupported;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24420) Upgrade ASM to 6.x to support JDK9+

2018-07-03 Thread DB Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai resolved SPARK-24420.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

> Upgrade ASM to 6.x to support JDK9+
> ---
>
> Key: SPARK-24420
> URL: https://issues.apache.org/jira/browse/SPARK-24420
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-24702) Unable to cast to calendar interval in spark sql.

2018-07-03 Thread Daniel Mateus Pires (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Mateus Pires updated SPARK-24702:

Comment: was deleted

(was: https://github.com/apache/spark/pull/21706)

> Unable to cast to calendar interval in spark sql.
> -
>
> Key: SPARK-24702
> URL: https://issues.apache.org/jira/browse/SPARK-24702
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Priyanka Garg
>Priority: Major
>
> when I am trying to cast string type to calendar interval type, I am getting 
> the following error:
> spark.sql("select cast(cast(interval '1' day as string) as 
> calendarinterval)").show()
> ^^^
>  
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1673)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:49)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:13779)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.org$apache$spark$sql$catalyst$parser$AstBuilder$$visitSparkDataType(AstBuilde



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24207) PrefixSpan: R API

2018-07-03 Thread Huaxin Gao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531498#comment-16531498
 ] 

Huaxin Gao commented on SPARK-24207:


I am working on this and will submit a PR soon. 

> PrefixSpan: R API
> -
>
> Key: SPARK-24207
> URL: https://issues.apache.org/jira/browse/SPARK-24207
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24702) Unable to cast to calendar interval in spark sql.

2018-07-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531481#comment-16531481
 ] 

Apache Spark commented on SPARK-24702:
--

User 'dmateusp' has created a pull request for this issue:
https://github.com/apache/spark/pull/21706

> Unable to cast to calendar interval in spark sql.
> -
>
> Key: SPARK-24702
> URL: https://issues.apache.org/jira/browse/SPARK-24702
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Priyanka Garg
>Priority: Major
>
> when I am trying to cast string type to calendar interval type, I am getting 
> the following error:
> spark.sql("select cast(cast(interval '1' day as string) as 
> calendarinterval)").show()
> ^^^
>  
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1673)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:49)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:13779)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.org$apache$spark$sql$catalyst$parser$AstBuilder$$visitSparkDataType(AstBuilde



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24702) Unable to cast to calendar interval in spark sql.

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24702:


Assignee: Apache Spark

> Unable to cast to calendar interval in spark sql.
> -
>
> Key: SPARK-24702
> URL: https://issues.apache.org/jira/browse/SPARK-24702
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Priyanka Garg
>Assignee: Apache Spark
>Priority: Major
>
> when I am trying to cast string type to calendar interval type, I am getting 
> the following error:
> spark.sql("select cast(cast(interval '1' day as string) as 
> calendarinterval)").show()
> ^^^
>  
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1673)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:49)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:13779)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.org$apache$spark$sql$catalyst$parser$AstBuilder$$visitSparkDataType(AstBuilde



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24702) Unable to cast to calendar interval in spark sql.

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24702:


Assignee: (was: Apache Spark)

> Unable to cast to calendar interval in spark sql.
> -
>
> Key: SPARK-24702
> URL: https://issues.apache.org/jira/browse/SPARK-24702
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Priyanka Garg
>Priority: Major
>
> when I am trying to cast string type to calendar interval type, I am getting 
> the following error:
> spark.sql("select cast(cast(interval '1' day as string) as 
> calendarinterval)").show()
> ^^^
>  
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1673)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:49)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:13779)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.org$apache$spark$sql$catalyst$parser$AstBuilder$$visitSparkDataType(AstBuilde



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24702) Unable to cast to calendar interval in spark sql.

2018-07-03 Thread Daniel Mateus Pires (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531480#comment-16531480
 ] 

Daniel Mateus Pires commented on SPARK-24702:
-

https://github.com/apache/spark/pull/21706

> Unable to cast to calendar interval in spark sql.
> -
>
> Key: SPARK-24702
> URL: https://issues.apache.org/jira/browse/SPARK-24702
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Priyanka Garg
>Priority: Major
>
> when I am trying to cast string type to calendar interval type, I am getting 
> the following error:
> spark.sql("select cast(cast(interval '1' day as string) as 
> calendarinterval)").show()
> ^^^
>  
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1673)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:49)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:13779)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.org$apache$spark$sql$catalyst$parser$AstBuilder$$visitSparkDataType(AstBuilde



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24702) Unable to cast to calendar interval in spark sql.

2018-07-03 Thread Daniel Mateus Pires (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531474#comment-16531474
 ] 

Daniel Mateus Pires commented on SPARK-24702:
-

Got it working, I'll open a PR

> Unable to cast to calendar interval in spark sql.
> -
>
> Key: SPARK-24702
> URL: https://issues.apache.org/jira/browse/SPARK-24702
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Priyanka Garg
>Priority: Major
>
> when I am trying to cast string type to calendar interval type, I am getting 
> the following error:
> spark.sql("select cast(cast(interval '1' day as string) as 
> calendarinterval)").show()
> ^^^
>  
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1673)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:49)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:13779)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.org$apache$spark$sql$catalyst$parser$AstBuilder$$visitSparkDataType(AstBuilde



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24702) Unable to cast to calendar interval in spark sql.

2018-07-03 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531388#comment-16531388
 ] 

Takeshi Yamamuro edited comment on SPARK-24702 at 7/3/18 1:36 PM:
--

In postgresql, we can do so;
{code}
postgres=# select CAST(CAST(interval '1 hour' AS varchar) AS interval);
 interval 
--
 01:00:00
(1 row)
{code}
But, not sure this cast is meaningful.


was (Author: maropu):
In postgresql, we can do so;
{code}
postgres=# select CAST(CAST(interval '1 hour' AS varchar) AS interval);
 interval 
--
 01:00:00
(1 row)
{cod
But, not sure this cast is meaningful.

> Unable to cast to calendar interval in spark sql.
> -
>
> Key: SPARK-24702
> URL: https://issues.apache.org/jira/browse/SPARK-24702
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Priyanka Garg
>Priority: Major
>
> when I am trying to cast string type to calendar interval type, I am getting 
> the following error:
> spark.sql("select cast(cast(interval '1' day as string) as 
> calendarinterval)").show()
> ^^^
>  
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1673)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:49)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:13779)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.org$apache$spark$sql$catalyst$parser$AstBuilder$$visitSparkDataType(AstBuilde



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24702) Unable to cast to calendar interval in spark sql.

2018-07-03 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531388#comment-16531388
 ] 

Takeshi Yamamuro commented on SPARK-24702:
--

In postgresql, we can do so;
{code}
postgres=# select CAST(CAST(interval '1 hour' AS varchar) AS interval);
 interval 
--
 01:00:00
(1 row)
{cod
But, not sure this cast is meaningful.

> Unable to cast to calendar interval in spark sql.
> -
>
> Key: SPARK-24702
> URL: https://issues.apache.org/jira/browse/SPARK-24702
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Priyanka Garg
>Priority: Major
>
> when I am trying to cast string type to calendar interval type, I am getting 
> the following error:
> spark.sql("select cast(cast(interval '1' day as string) as 
> calendarinterval)").show()
> ^^^
>  
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1673)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:49)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:13779)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.org$apache$spark$sql$catalyst$parser$AstBuilder$$visitSparkDataType(AstBuilde



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24702) Unable to cast to calendar interval in spark sql.

2018-07-03 Thread Daniel Mateus Pires (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531379#comment-16531379
 ] 

Daniel Mateus Pires commented on SPARK-24702:
-

The error is:

DataType calendarinterval is not supported.(line 1, pos 48)

== SQL ==
select cast(cast(interval '1' day as string) as calendarinterval)

I was able to reproduce it in 2.4.0, I can see CalendarIntervalType inside 
org.apache.sql.types so it is definitely supported, looking into it more (but 
I'm very new to the codebase) 

> Unable to cast to calendar interval in spark sql.
> -
>
> Key: SPARK-24702
> URL: https://issues.apache.org/jira/browse/SPARK-24702
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Priyanka Garg
>Priority: Major
>
> when I am trying to cast string type to calendar interval type, I am getting 
> the following error:
> spark.sql("select cast(cast(interval '1' day as string) as 
> calendarinterval)").show()
> ^^^
>  
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1673)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitPrimitiveDataType$1.apply(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:1651)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:49)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:13779)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.org$apache$spark$sql$catalyst$parser$AstBuilder$$visitSparkDataType(AstBuilde



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24727:


Assignee: (was: Apache Spark)

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531376#comment-16531376
 ] 

Apache Spark commented on SPARK-24727:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/21705

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24727:


Assignee: Apache Spark

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24703) Unable to multiply calender interval with long/int

2018-07-03 Thread Daniel Mateus Pires (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531363#comment-16531363
 ] 

Daniel Mateus Pires commented on SPARK-24703:
-

was able to reproduce in 2.4.0

> Unable to multiply calender interval with long/int
> --
>
> Key: SPARK-24703
> URL: https://issues.apache.org/jira/browse/SPARK-24703
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Priyanka Garg
>Priority: Major
>
> When i am trying to multiply calender interval with long/int , I am getting 
> below error. The same syntax is supported in Postgres.
>  spark.sql("select 3 *  interval '1' day").show()
> org.apache.spark.sql.AnalysisException: cannot resolve '(3 * interval 1 
> days)' due to data type mismatch: differing types in '(3 * interval 1 days)' 
> (int and calendarinterval).; line 1 pos 7;
> 'Project [unresolvedalias((3 * interval 1 days), None)]
> +- OneRowRelation
>  
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24422) Add JDK9+ in our Jenkins' build servers

2018-07-03 Thread Andrew Korzhuev (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531241#comment-16531241
 ] 

Andrew Korzhuev commented on SPARK-24422:
-

Also `.travis.yml` needs to be fixed in the following way:
{code:java}
# 2. Choose language and target JDKs for parallel builds.
language: java
jdk:
  - openjdk8
  - openjdk9
  - openjdk10
{code}

> Add JDK9+ in our Jenkins' build servers
> ---
>
> Key: SPARK-24422
> URL: https://issues.apache.org/jira/browse/SPARK-24422
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24422) Add JDK9+ in our Jenkins' build servers

2018-07-03 Thread Andrew Korzhuev (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531241#comment-16531241
 ] 

Andrew Korzhuev edited comment on SPARK-24422 at 7/3/18 11:46 AM:
--

Also `.travis.yml` needs to be fixed in the following way:
{code:java}
# 2. Choose language and target JDKs for parallel builds.
language: java
jdk:
  - openjdk8
  - openjdk9
{code}


was (Author: akorzhuev):
Also `.travis.yml` needs to be fixed in the following way:
{code:java}
# 2. Choose language and target JDKs for parallel builds.
language: java
jdk:
  - openjdk8
  - openjdk9
  - openjdk10
{code}

> Add JDK9+ in our Jenkins' build servers
> ---
>
> Key: SPARK-24422
> URL: https://issues.apache.org/jira/browse/SPARK-24422
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24421) sun.misc.Unsafe in JDK9+

2018-07-03 Thread Andrew Korzhuev (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531235#comment-16531235
 ] 

Andrew Korzhuev edited comment on SPARK-24421 at 7/3/18 11:44 AM:
--

If I understand this correctly, then the only deprecated JDK9+ API Spark is 
using is `sun.misc.Cleaner` (while `sun.misc.Unsafe` is still accessible) in 
`[common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java|https://github.com/andrusha/spark/commit/7da06d3c725169f9764225f5a29886eb56bee191#diff-c7483c7efce631c783676f014ba2b0ed]`,
 which is fixable in the following way:
{code:java}
@@ -22,7 +22,7 @@
import java.lang.reflect.Method;
import java.nio.ByteBuffer;

-import sun.misc.Cleaner;
+import java.lang.ref.Cleaner;
import sun.misc.Unsafe;

public final class Platform {
@@ -169,7 +169,8 @@ public static ByteBuffer allocateDirectBuffer(int size) {
cleanerField.setAccessible(true);
long memory = allocateMemory(size);
ByteBuffer buffer = (ByteBuffer) constructor.newInstance(memory, size);
- Cleaner cleaner = Cleaner.create(buffer, () -> freeMemory(memory));
+ Cleaner cleaner = Cleaner.create();
+ cleaner.register(buffer, () -> freeMemory(memory));
cleanerField.set(buffer, cleaner);
return buffer;
{code}
[https://github.com/andrusha/spark/commit/7da06d3c725169f9764225f5a29886eb56bee191#diff-c7483c7efce631c783676f014ba2b0ed]

 


was (Author: akorzhuev):
If I understand this correctly, then the only deprecated JDK9+ API Spark is 
using is `sun.misc.Cleaner` (while `sun.misc.Unsafe` is still accessible) in 
`[common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java|https://github.com/andrusha/spark/commit/7da06d3c725169f9764225f5a29886eb56bee191#diff-c7483c7efce631c783676f014ba2b0ed]`,
 which is fixable in the following way:

 
{code:java}
@@ -22,7 +22,7 @@
import java.lang.reflect.Method;
import java.nio.ByteBuffer;

-import sun.misc.Cleaner;
+import java.lang.ref.Cleaner;
import sun.misc.Unsafe;

public final class Platform {
@@ -169,7 +169,8 @@ public static ByteBuffer allocateDirectBuffer(int size) {
cleanerField.setAccessible(true);
long memory = allocateMemory(size);
ByteBuffer buffer = (ByteBuffer) constructor.newInstance(memory, size);
- Cleaner cleaner = Cleaner.create(buffer, () -> freeMemory(memory));
+ Cleaner cleaner = Cleaner.create();
+ cleaner.register(buffer, () -> freeMemory(memory));
cleanerField.set(buffer, cleaner);
return buffer;
{code}
[https://github.com/andrusha/spark/commit/7da06d3c725169f9764225f5a29886eb56bee191#diff-c7483c7efce631c783676f014ba2b0ed]

 

> sun.misc.Unsafe in JDK9+
> 
>
> Key: SPARK-24421
> URL: https://issues.apache.org/jira/browse/SPARK-24421
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Priority: Major
>
> Many internal APIs such as unsafe are encapsulated in JDK9+, see 
> http://openjdk.java.net/jeps/260 for detail.
> To use Unsafe, we need to add *jdk.unsupported* to our code’s module 
> declaration:
> {code:java}
> module java9unsafe {
> requires jdk.unsupported;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24421) sun.misc.Unsafe in JDK9+

2018-07-03 Thread Andrew Korzhuev (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531235#comment-16531235
 ] 

Andrew Korzhuev commented on SPARK-24421:
-

If I understand this correctly, then the only deprecated JDK9+ API Spark is 
using is `sun.misc.Cleaner` (while `sun.misc.Unsafe` is still accessible) in 
`[common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java|https://github.com/andrusha/spark/commit/7da06d3c725169f9764225f5a29886eb56bee191#diff-c7483c7efce631c783676f014ba2b0ed]`,
 which is fixable in the following way:

 
{code:java}
@@ -22,7 +22,7 @@
import java.lang.reflect.Method;
import java.nio.ByteBuffer;

-import sun.misc.Cleaner;
+import java.lang.ref.Cleaner;
import sun.misc.Unsafe;

public final class Platform {
@@ -169,7 +169,8 @@ public static ByteBuffer allocateDirectBuffer(int size) {
cleanerField.setAccessible(true);
long memory = allocateMemory(size);
ByteBuffer buffer = (ByteBuffer) constructor.newInstance(memory, size);
- Cleaner cleaner = Cleaner.create(buffer, () -> freeMemory(memory));
+ Cleaner cleaner = Cleaner.create();
+ cleaner.register(buffer, () -> freeMemory(memory));
cleanerField.set(buffer, cleaner);
return buffer;
{code}
[https://github.com/andrusha/spark/commit/7da06d3c725169f9764225f5a29886eb56bee191#diff-c7483c7efce631c783676f014ba2b0ed]

 

> sun.misc.Unsafe in JDK9+
> 
>
> Key: SPARK-24421
> URL: https://issues.apache.org/jira/browse/SPARK-24421
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Priority: Major
>
> Many internal APIs such as unsafe are encapsulated in JDK9+, see 
> http://openjdk.java.net/jeps/260 for detail.
> To use Unsafe, we need to add *jdk.unsupported* to our code’s module 
> declaration:
> {code:java}
> module java9unsafe {
> requires jdk.unsupported;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24734) Fix containsNull of Concat for array type.

2018-07-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531229#comment-16531229
 ] 

Apache Spark commented on SPARK-24734:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/21704

> Fix containsNull of Concat for array type.
> --
>
> Key: SPARK-24734
> URL: https://issues.apache.org/jira/browse/SPARK-24734
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently {{Concat}} for array type uses the data type of the first child as 
> its own data type, but the children might include an array containing nulls.
> We should aware the nullabilities of all children.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24734) Fix containsNull of Concat for array type.

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24734:


Assignee: (was: Apache Spark)

> Fix containsNull of Concat for array type.
> --
>
> Key: SPARK-24734
> URL: https://issues.apache.org/jira/browse/SPARK-24734
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently {{Concat}} for array type uses the data type of the first child as 
> its own data type, but the children might include an array containing nulls.
> We should aware the nullabilities of all children.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24734) Fix containsNull of Concat for array type.

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24734:


Assignee: Apache Spark

> Fix containsNull of Concat for array type.
> --
>
> Key: SPARK-24734
> URL: https://issues.apache.org/jira/browse/SPARK-24734
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> Currently {{Concat}} for array type uses the data type of the first child as 
> its own data type, but the children might include an array containing nulls.
> We should aware the nullabilities of all children.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531225#comment-16531225
 ] 

Wenchen Fan edited comment on SPARK-24727 at 7/3/18 11:29 AM:
--

BTW this  needs to be a static conf. the CodeGenerator object is per JVM.


was (Author: cloud_fan):
BTW this  needs to be a static SQL. the CodeGenerator object is per JVM.

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531225#comment-16531225
 ] 

Wenchen Fan commented on SPARK-24727:
-

BTW this  needs to be a static SQL. the CodeGenerator object is per JVM.

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531224#comment-16531224
 ] 

Wenchen Fan commented on SPARK-24727:
-

it's because it was hard to access SQLConf at executor side. I think we can do 
it now.

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24734) Fix containsNull of Concat for array type.

2018-07-03 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-24734:
-

 Summary: Fix containsNull of Concat for array type.
 Key: SPARK-24734
 URL: https://issues.apache.org/jira/browse/SPARK-24734
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Takuya Ueshin


Currently {{Concat}} for array type uses the data type of the first child as 
its own data type, but the children might include an array containing nulls.
We should aware the nullabilities of all children.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24733) Dataframe saved to parquet can have different metadata then the resulting parquet file

2018-07-03 Thread David Herskovics (JIRA)
David Herskovics created SPARK-24733:


 Summary: Dataframe saved to parquet can have different metadata 
then the resulting parquet file
 Key: SPARK-24733
 URL: https://issues.apache.org/jira/browse/SPARK-24733
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: David Herskovics


See the repro using spark-shell below:
Let's say that we have a dataframe called *df_with_metadata* which has column 
*name* with metadata.
 
{code:scala}
scala> df_with_metadata.schema.json // Check that we have the metadata here.
scala> df_with_metadata.createOrReplaceTempView("input")
scala> val df2 = spark.sql("select case when true then name else null end as 
name from input")
scala> df2.schema.json // We don't have the metadata anymore.
scala> df2.write.parquet("no_metadata_expected")
scala> val df3 = spark.read.parquet("no_metadata_expected")
scala> df3.schema.json // And the metadata is there again so the 
no_metadata_expected does have metadata.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531144#comment-16531144
 ] 

Takeshi Yamamuro commented on SPARK-24727:
--

I see. It makes some sense to me. Any reason to hard-code the value? cc: 
[~smilegator] [~cloud_fan]

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread ant_nebula (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531107#comment-16531107
 ] 

ant_nebula commented on SPARK-24727:


NO.Spark would do DAG schedule for each streaming batchDuration job.

If my one streaming batchDuration jobs completely fill the cache 100, then the 
overfill code would do janino compile every streaming batchDuration job.

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24729) Spark - stackoverflow error - org.apache.spark.sql.catalyst.plans.QueryPlan

2018-07-03 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531071#comment-16531071
 ] 

Takeshi Yamamuro commented on SPARK-24729:
--

Can you run on v2.3.1?

> Spark - stackoverflow error - org.apache.spark.sql.catalyst.plans.QueryPlan
> ---
>
> Key: SPARK-24729
> URL: https://issues.apache.org/jira/browse/SPARK-24729
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.1.1
>Reporter: t oo
>Priority: Major
>
> Ran a spark (v2.1.1) job that joins 2 rdds (one is .txt file from S3, another 
> is parquet from S3) the job then merges the dataset (ie get latest row per 
> PK, if PK exists in txt and parquet then take the row from the .txt) and 
> writes out a new parquet to S3. Got this error but upon re-running it worked 
> fine. Both the .txt and parquet have 302 columns. The .txt has 191 rows, the 
> parquet has 156300 rows. Does anyone know the cause?
>  
> {code:java}
>  
> 18/07/02 13:51:56 INFO TaskSetManager: Starting task 0.0 in stage 14.0 (TID 
> 134, 10.160.122.226, executor 0, partition 0, PROCESS_LOCAL, 6337 bytes)
> 18/07/02 13:51:56 INFO BlockManagerInfo: Added broadcast_18_piece0 in memory 
> on 10.160.122.226:38011 (size: 27.2 KB, free: 4.6 GB)
> 18/07/02 13:51:56 INFO TaskSetManager: Finished task 0.0 in stage 14.0 (TID 
> 134) in 295 ms on 10.160.122.226 (executor 0) (1/1)
> 18/07/02 13:51:56 INFO TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks 
> have all completed, from pool
> 18/07/02 13:51:56 INFO DAGScheduler: ResultStage 14 (load at Data.scala:25) 
> finished in 0.295 s
> 18/07/02 13:51:56 INFO DAGScheduler: Job 7 finished: load at Data.scala:25, 
> took 0.310932 s
> 18/07/02 13:51:57 INFO FileSourceStrategy: Pruning directories with:
> 18/07/02 13:51:57 INFO FileSourceStrategy: Post-Scan Filters:
> 18/07/02 13:51:57 INFO FileSourceStrategy: Output Data Schema: struct string, created: timestamp, created_by: string, last_upd: timestamp, 
> last_upd_by: string ... 300 more fields>
> 18/07/02 13:51:57 INFO FileSourceStrategy: Pushed Filters:
> 18/07/02 13:51:57 INFO MemoryStore: Block broadcast_19 stored as values in 
> memory (estimated size 387.2 KB, free 911.2 MB)
> 18/07/02 13:51:57 INFO MemoryStore: Block broadcast_19_piece0 stored as bytes 
> in memory (estimated size 33.7 KB, free 911.1 MB)
> 18/07/02 13:51:57 INFO BlockManagerInfo: Added broadcast_19_piece0 in memory 
> on 10.160.123.242:38105 (size: 33.7 KB, free: 912.2 MB)
> 18/07/02 13:51:57 INFO SparkContext: Created broadcast 19 from cache at 
> Upsert.scala:25
> 18/07/02 13:51:57 INFO FileSourceScanExec: Planning scan with bin packing, 
> max size: 48443541 bytes, open cost is considered as scanning 4194304 bytes.
> 18/07/02 13:51:57 INFO SparkContext: Starting job: take at Utils.scala:28
> 18/07/02 13:51:57 INFO DAGScheduler: Got job 8 (take at Utils.scala:28) with 
> 1 output partitions
> 18/07/02 13:51:57 INFO DAGScheduler: Final stage: ResultStage 15 (take at 
> Utils.scala:28)
> 18/07/02 13:51:57 INFO DAGScheduler: Parents of final stage: List()
> 18/07/02 13:51:57 INFO DAGScheduler: Missing parents: List()
> 18/07/02 13:51:57 INFO DAGScheduler: Submitting ResultStage 15 
> (MapPartitionsRDD[65] at take at Utils.scala:28), which has no missing parents
> 18/07/02 13:51:57 INFO MemoryStore: Block broadcast_20 stored as values in 
> memory (estimated size 321.5 KB, free 910.8 MB)
> 18/07/02 13:51:57 INFO MemoryStore: Block broadcast_20_piece0 stored as bytes 
> in memory (estimated size 93.0 KB, free 910.7 MB)
> 18/07/02 13:51:57 INFO BlockManagerInfo: Added broadcast_20_piece0 in memory 
> on 10.160.123.242:38105 (size: 93.0 KB, free: 912.1 MB)
> 18/07/02 13:51:57 INFO SparkContext: Created broadcast 20 from broadcast at 
> DAGScheduler.scala:996
> 18/07/02 13:51:57 INFO DAGScheduler: Submitting 1 missing tasks from 
> ResultStage 15 (MapPartitionsRDD[65] at take at Utils.scala:28)
> 18/07/02 13:51:57 INFO TaskSchedulerImpl: Adding task set 15.0 with 1 tasks
> 18/07/02 13:51:57 INFO TaskSetManager: Starting task 0.0 in stage 15.0 (TID 
> 135, 10.160.122.226, executor 0, partition 0, PROCESS_LOCAL, 9035 bytes)
> 18/07/02 13:51:57 INFO BlockManagerInfo: Added broadcast_20_piece0 in memory 
> on 10.160.122.226:38011 (size: 93.0 KB, free: 4.6 GB)
> 18/07/02 13:51:57 INFO BlockManagerInfo: Added broadcast_19_piece0 in memory 
> on 10.160.122.226:38011 (size: 33.7 KB, free: 4.6 GB)
> 18/07/02 13:52:05 INFO BlockManagerInfo: Added rdd_61_0 in memory on 
> 10.160.122.226:38011 (size: 38.9 MB, free: 4.5 GB)
> 18/07/02 13:52:09 INFO BlockManagerInfo: Added rdd_63_0 in memory on 
> 10.160.122.226:38011 (size: 38.9 MB, free: 4.5 GB)
> 18/07/02 13:52:09 INFO TaskSetManager: Finished task 0.0 in stage 15.0 (TID 
>

[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531058#comment-16531058
 ] 

Takeshi Yamamuro commented on SPARK-24727:
--

Your query in streaming changes every calculation invoked?

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24705) Spark.sql.adaptive.enabled=true is enabled and self-join query

2018-07-03 Thread cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531023#comment-16531023
 ] 

cheng commented on SPARK-24705:
---

It is currently closed. If I want to use this feature to improve performance, 
it is currently not possible. I think this is a bug and it is necessary to fix 
it.

> Spark.sql.adaptive.enabled=true is enabled and self-join query
> --
>
> Key: SPARK-24705
> URL: https://issues.apache.org/jira/browse/SPARK-24705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.1
>Reporter: cheng
>Priority: Minor
> Attachments: Error stack.txt
>
>
> [~smilegator]
> When loading data using jdbc and enabling spark.sql.adaptive.enabled=true , 
> for example loading a tableA table, unexpected results can occur when you use 
> the following query.
> For example:
> device_loc table comes from the jdbc data source
> select tv_a.imei
> from ( select a.imei,a.speed from device_loc a) tv_a
> inner join ( select a.imei,a.speed from device_loc a ) tv_b on tv_a.imei = 
> tv_b.imei
> group by tv_a.imei
> When the cache tabel device_loc is executed before this query is executed, 
> everything is fine,However, if you do not execute cache table, unexpected 
> results will occur, resulting in failure to execute.
> Remarks:Attachment records the stack when the error occurred



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24681) Cannot create a view from a table when a nested column name contains ':'

2018-07-03 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531016#comment-16531016
 ] 

Takeshi Yamamuro edited comment on SPARK-24681 at 7/3/18 8:37 AM:
--

I've looked over related code andI think we cannot use `:` in Hive metastore 
column names: 
[https://github.com/apache/hive/blob/release-1.2.1/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java#L239]

The current master checks if column names don't include comma in column names 
only (you fixed this a year ago): 
[https://github.com/apache/spark/blob/a7c8f0c8cb144a026ea21e8780107e363ceacb8d/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L141]
IMHO we need to check ':' and ';' here, too. WDYT?

Or, we need to accept ':' in column names?

 


was (Author: maropu):
I've looked over related code andI think we cannot use `:` in Hive metastore 
column names: 
https://github.com/apache/hive/blob/release-1.2.1/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java#L239

The current master checks if column names don't include comma in column names 
only (you fixed this a year ago): 
https://github.com/apache/spark/blob/a7c8f0c8cb144a026ea21e8780107e363ceacb8d/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L141
So, IMHO we need to check ':' and ';' here, too. WDYT? Or, we need to accept 
':' in column names?


 

> Cannot create a view from a table when a nested column name contains ':'
> 
>
> Key: SPARK-24681
> URL: https://issues.apache.org/jira/browse/SPARK-24681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Adrian Ionescu
>Priority: Major
>
> Here's a patch that reproduces the issue: 
> {code:java}
> diff --git 
> a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> index 09c1547..29bb3db 100644 
> --- 
> a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> +++ 
> b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> @@ -19,6 +19,7 @@ package org.apache.spark.sql.hive 
>  
> import org.apache.spark.sql.{QueryTest, Row} 
> import org.apache.spark.sql.execution.datasources.parquet.ParquetTest 
> +import org.apache.spark.sql.functions.{lit, struct} 
> import org.apache.spark.sql.hive.test.TestHiveSingleton 
>  
> case class Cases(lower: String, UPPER: String) 
> @@ -76,4 +77,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
> with TestHiveSingleton 
>   } 
> } 
>   } 
> + 
> +  test("column names including ':' characters") { 
> +    withTempPath { path => 
> +  withTable("test_table") { 
> +    spark.range(0) 
> +  .select(struct(lit(0).as("nested:column")).as("toplevel:column")) 
> +  .write.format("parquet") 
> +  .option("path", path.getCanonicalPath) 
> +  .saveAsTable("test_table") 
> + 
> +    sql("CREATE VIEW test_view_1 AS SELECT `toplevel:column`.* FROM 
> test_table") 
> +    sql("CREATE VIEW test_view_2 AS SELECT * FROM test_table") 
> + 
> +  } 
> +    } 
> +  } 
> }{code}
> The first "CREATE VIEW" statement succeeds, but the second one fails with:
> {code:java}
> org.apache.spark.SparkException: Cannot recognize hive type string: 
> struct
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24681) Cannot create a view from a table when a nested column name contains ':'

2018-07-03 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531016#comment-16531016
 ] 

Takeshi Yamamuro commented on SPARK-24681:
--

I've looked over related code andI think we cannot use `:` in Hive metastore 
column names: 
https://github.com/apache/hive/blob/release-1.2.1/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java#L239

The current master checks if column names don't include comma in column names 
only (you fixed this a year ago): 
https://github.com/apache/spark/blob/a7c8f0c8cb144a026ea21e8780107e363ceacb8d/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L141
So, IMHO we need to check ':' and ';' here, too. WDYT? Or, we need to accept 
':' in column names?


 

> Cannot create a view from a table when a nested column name contains ':'
> 
>
> Key: SPARK-24681
> URL: https://issues.apache.org/jira/browse/SPARK-24681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Adrian Ionescu
>Priority: Major
>
> Here's a patch that reproduces the issue: 
> {code:java}
> diff --git 
> a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> index 09c1547..29bb3db 100644 
> --- 
> a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> +++ 
> b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala 
> @@ -19,6 +19,7 @@ package org.apache.spark.sql.hive 
>  
> import org.apache.spark.sql.{QueryTest, Row} 
> import org.apache.spark.sql.execution.datasources.parquet.ParquetTest 
> +import org.apache.spark.sql.functions.{lit, struct} 
> import org.apache.spark.sql.hive.test.TestHiveSingleton 
>  
> case class Cases(lower: String, UPPER: String) 
> @@ -76,4 +77,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
> with TestHiveSingleton 
>   } 
> } 
>   } 
> + 
> +  test("column names including ':' characters") { 
> +    withTempPath { path => 
> +  withTable("test_table") { 
> +    spark.range(0) 
> +  .select(struct(lit(0).as("nested:column")).as("toplevel:column")) 
> +  .write.format("parquet") 
> +  .option("path", path.getCanonicalPath) 
> +  .saveAsTable("test_table") 
> + 
> +    sql("CREATE VIEW test_view_1 AS SELECT `toplevel:column`.* FROM 
> test_table") 
> +    sql("CREATE VIEW test_view_2 AS SELECT * FROM test_table") 
> + 
> +  } 
> +    } 
> +  } 
> }{code}
> The first "CREATE VIEW" statement succeeds, but the second one fails with:
> {code:java}
> org.apache.spark.SparkException: Cannot recognize hive type string: 
> struct
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24705) Spark.sql.adaptive.enabled=true is enabled and self-join query

2018-07-03 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531007#comment-16531007
 ] 

Takeshi Yamamuro commented on SPARK-24705:
--

Why don't you turn off `spark.sql.adaptive.enabled`?

> Spark.sql.adaptive.enabled=true is enabled and self-join query
> --
>
> Key: SPARK-24705
> URL: https://issues.apache.org/jira/browse/SPARK-24705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.1
>Reporter: cheng
>Priority: Minor
> Attachments: Error stack.txt
>
>
> [~smilegator]
> When loading data using jdbc and enabling spark.sql.adaptive.enabled=true , 
> for example loading a tableA table, unexpected results can occur when you use 
> the following query.
> For example:
> device_loc table comes from the jdbc data source
> select tv_a.imei
> from ( select a.imei,a.speed from device_loc a) tv_a
> inner join ( select a.imei,a.speed from device_loc a ) tv_b on tv_a.imei = 
> tv_b.imei
> group by tv_a.imei
> When the cache tabel device_loc is executed before this query is executed, 
> everything is fine,However, if you do not execute cache table, unexpected 
> results will occur, resulting in failure to execute.
> Remarks:Attachment records the stack when the error occurred



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24732) Type coercion between MapTypes.

2018-07-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530993#comment-16530993
 ] 

Apache Spark commented on SPARK-24732:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/21703

> Type coercion between MapTypes.
> ---
>
> Key: SPARK-24732
> URL: https://issues.apache.org/jira/browse/SPARK-24732
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Takuya Ueshin
>Priority: Major
>
> It seems currently we don't allow type coercion between maps.
> We can support type coercion between MapTypes where both the key types and 
> the value types are compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24732) Type coercion between MapTypes.

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24732:


Assignee: Apache Spark

> Type coercion between MapTypes.
> ---
>
> Key: SPARK-24732
> URL: https://issues.apache.org/jira/browse/SPARK-24732
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> It seems currently we don't allow type coercion between maps.
> We can support type coercion between MapTypes where both the key types and 
> the value types are compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24732) Type coercion between MapTypes.

2018-07-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24732:


Assignee: (was: Apache Spark)

> Type coercion between MapTypes.
> ---
>
> Key: SPARK-24732
> URL: https://issues.apache.org/jira/browse/SPARK-24732
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Takuya Ueshin
>Priority: Major
>
> It seems currently we don't allow type coercion between maps.
> We can support type coercion between MapTypes where both the key types and 
> the value types are compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24732) Type coercion between MapTypes.

2018-07-03 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-24732:
-

 Summary: Type coercion between MapTypes.
 Key: SPARK-24732
 URL: https://issues.apache.org/jira/browse/SPARK-24732
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.1
Reporter: Takuya Ueshin


It seems currently we don't allow type coercion between maps.
We can support type coercion between MapTypes where both the key types and the 
value types are compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24705) Spark.sql.adaptive.enabled=true is enabled and self-join query

2018-07-03 Thread cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530978#comment-16530978
 ] 

cheng commented on SPARK-24705:
---

This is indeed the case, and it is often used in my business, so I found this 
problem and hope that the community can help solve it.

> Spark.sql.adaptive.enabled=true is enabled and self-join query
> --
>
> Key: SPARK-24705
> URL: https://issues.apache.org/jira/browse/SPARK-24705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.1
>Reporter: cheng
>Priority: Minor
> Attachments: Error stack.txt
>
>
> [~smilegator]
> When loading data using jdbc and enabling spark.sql.adaptive.enabled=true , 
> for example loading a tableA table, unexpected results can occur when you use 
> the following query.
> For example:
> device_loc table comes from the jdbc data source
> select tv_a.imei
> from ( select a.imei,a.speed from device_loc a) tv_a
> inner join ( select a.imei,a.speed from device_loc a ) tv_b on tv_a.imei = 
> tv_b.imei
> group by tv_a.imei
> When the cache tabel device_loc is executed before this query is executed, 
> everything is fine,However, if you do not execute cache table, unexpected 
> results will occur, resulting in failure to execute.
> Remarks:Attachment records the stack when the error occurred



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24535) Fix java version parsing in SparkR on Windows

2018-07-03 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530963#comment-16530963
 ] 

Felix Cheung edited comment on SPARK-24535 at 7/3/18 7:50 AM:
--

ok, I submitted a modified package to win-builder, and was able to confirm the 
problem with launchScript. updating with the fix, result here 
[https://win-builder.r-project.org/zD6OfPID9JtR/00check.log]

basically, I don't think launchScript(.. wait = T) is working on Windows.


was (Author: felixcheung):
ok, I submitted a modified package to win-builder, and was able to confirm the 
problem with launchScript. updating with the fix, result here 
https://win-builder.r-project.org/zD6OfPID9JtR/00check.log

> Fix java version parsing in SparkR on Windows
> -
>
> Key: SPARK-24535
> URL: https://issues.apache.org/jira/browse/SPARK-24535
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Shivaram Venkataraman
>Assignee: Felix Cheung
>Priority: Blocker
>
> We see errors on CRAN of the form 
> {code:java}
>   java version "1.8.0_144"
>   Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>   Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>   Picked up _JAVA_OPTIONS: -XX:-UsePerfData 
>   -- 1. Error: create DataFrame from list or data.frame (@test_basic.R#21)  
> --
>   subscript out of bounds
>   1: sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
> sparkConfig = sparkRTestConfig) at 
> D:/temp/RtmpIJ8Cc3/RLIBS_3242c713c3181/SparkR/tests/testthat/test_basic.R:21
>   2: sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, 
> sparkExecutorEnvMap, 
>  sparkJars, sparkPackages)
>   3: checkJavaVersion()
>   4: strsplit(javaVersionFilter[[1]], "[\"]")
> {code}
> The complete log file is at 
> http://home.apache.org/~shivaram/SparkR_2.3.1_check_results/Windows/00check.log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24535) Fix java version parsing in SparkR on Windows

2018-07-03 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-24535:
-
Summary: Fix java version parsing in SparkR on Windows  (was: Fix java 
version parsing in SparkR)

> Fix java version parsing in SparkR on Windows
> -
>
> Key: SPARK-24535
> URL: https://issues.apache.org/jira/browse/SPARK-24535
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Shivaram Venkataraman
>Assignee: Felix Cheung
>Priority: Blocker
>
> We see errors on CRAN of the form 
> {code:java}
>   java version "1.8.0_144"
>   Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>   Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>   Picked up _JAVA_OPTIONS: -XX:-UsePerfData 
>   -- 1. Error: create DataFrame from list or data.frame (@test_basic.R#21)  
> --
>   subscript out of bounds
>   1: sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
> sparkConfig = sparkRTestConfig) at 
> D:/temp/RtmpIJ8Cc3/RLIBS_3242c713c3181/SparkR/tests/testthat/test_basic.R:21
>   2: sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, 
> sparkExecutorEnvMap, 
>  sparkJars, sparkPackages)
>   3: checkJavaVersion()
>   4: strsplit(javaVersionFilter[[1]], "[\"]")
> {code}
> The complete log file is at 
> http://home.apache.org/~shivaram/SparkR_2.3.1_check_results/Windows/00check.log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24535) Fix java version parsing in SparkR

2018-07-03 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530963#comment-16530963
 ] 

Felix Cheung commented on SPARK-24535:
--

ok, I submitted a modified package to win-builder, and was able to confirm the 
problem with launchScript. updating with the fix, result here 
https://win-builder.r-project.org/zD6OfPID9JtR/00check.log

> Fix java version parsing in SparkR
> --
>
> Key: SPARK-24535
> URL: https://issues.apache.org/jira/browse/SPARK-24535
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Shivaram Venkataraman
>Assignee: Felix Cheung
>Priority: Blocker
>
> We see errors on CRAN of the form 
> {code:java}
>   java version "1.8.0_144"
>   Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>   Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>   Picked up _JAVA_OPTIONS: -XX:-UsePerfData 
>   -- 1. Error: create DataFrame from list or data.frame (@test_basic.R#21)  
> --
>   subscript out of bounds
>   1: sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
> sparkConfig = sparkRTestConfig) at 
> D:/temp/RtmpIJ8Cc3/RLIBS_3242c713c3181/SparkR/tests/testthat/test_basic.R:21
>   2: sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, 
> sparkExecutorEnvMap, 
>  sparkJars, sparkPackages)
>   3: checkJavaVersion()
>   4: strsplit(javaVersionFilter[[1]], "[\"]")
> {code}
> The complete log file is at 
> http://home.apache.org/~shivaram/SparkR_2.3.1_check_results/Windows/00check.log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24731) java.io.IOException: s3n://bucketname: 400 : Bad Request

2018-07-03 Thread sivakphani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivakphani updated SPARK-24731:
---
Summary:  java.io.IOException: s3n://bucketname: 400 : Bad Request  (was:  
java.io.IOException: s3n://aail-twitter : 400 : Bad Request)

>  java.io.IOException: s3n://bucketname: 400 : Bad Request
> -
>
> Key: SPARK-24731
> URL: https://issues.apache.org/jira/browse/SPARK-24731
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.1
>Reporter: sivakphani
>Priority: Major
>
> I wrote code for connecting aws s3 bucket for read json file through pyspark.
> when i submit in locally it getting this error
>  File "PYSPARK_examples/Pyspark11.py", line 105, in 
>     df=sqlContext.read.json(path2)
>   File 
> "/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
>  line 261, in json
>   File 
> "/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
>  line 1257, in __call__
>   File 
> "/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py",
>  line 63, in deco
>   File 
> "/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py",
>  line 328, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o29.json.
> : java.io.IOException: s3n://bucketname: 400 : Bad Request
>     at 
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:453)
>     at 
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427)
>     at 
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
>     at 
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at org.apache.hadoop.fs.s3native.$Proxy12.retrieveMetadata(Unknown Source)
>     at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:476)
>     at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
>     at 
> org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:714)
>     at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
>     at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
>     at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>     at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>     at scala.collection.immutable.List.foreach(List.scala:381)
>     at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>     at scala.collection.immutable.List.flatMap(List.scala:344)
>     at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:388)
>     at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
>     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
>     at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:397)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>     at py4j.Gateway.invoke(Gateway.java:282)
>     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>     at py4j.commands.CallCommand.execute(CallCommand.java:79)
>     at py4j.GatewayConnection.run(GatewayConnection.java:238)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: org.jets3t.service.impl.rest.HttpException: 400 Bad Request
>     at 
> org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:425)
>     at 
> org.jets

[jira] [Created] (SPARK-24731) java.io.IOException: s3n://aail-twitter : 400 : Bad Request

2018-07-03 Thread sivakphani (JIRA)
sivakphani created SPARK-24731:
--

 Summary:  java.io.IOException: s3n://aail-twitter : 400 : Bad 
Request
 Key: SPARK-24731
 URL: https://issues.apache.org/jira/browse/SPARK-24731
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.3.1
Reporter: sivakphani


I wrote code for connecting aws s3 bucket for read json file through pyspark.

when i submit in locally it getting this error

 File "PYSPARK_examples/Pyspark11.py", line 105, in 
    df=sqlContext.read.json(path2)
  File 
"/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
 line 261, in json
  File 
"/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
 line 1257, in __call__
  File 
"/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py",
 line 63, in deco
  File 
"/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py",
 line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o29.json.
: java.io.IOException: s3n://bucketname: 400 : Bad Request
    at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:453)
    at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427)
    at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
    at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at org.apache.hadoop.fs.s3native.$Proxy12.retrieveMetadata(Unknown Source)
    at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:476)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
    at 
org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:714)
    at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
    at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389)
    at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.immutable.List.flatMap(List.scala:344)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:388)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
    at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:397)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.jets3t.service.impl.rest.HttpException: 400 Bad Request
    at 
org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:425)
    at 
org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279)
    at 
org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:1052)
    at 
org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2264)
    at 
org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2193)
    at 
org.jets3t.service.StorageService.getObjectDetails(S

[jira] [Commented] (SPARK-23698) Spark code contains numerous undefined names in Python 3

2018-07-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530938#comment-16530938
 ] 

Apache Spark commented on SPARK-23698:
--

User 'cclauss' has created a pull request for this issue:
https://github.com/apache/spark/pull/21702

> Spark code contains numerous undefined names in Python 3
> 
>
> Key: SPARK-23698
> URL: https://issues.apache.org/jira/browse/SPARK-23698
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: cclauss
>Priority: Minor
>
> flake8 testing of https://github.com/apache/spark on Python 3.6.3
> $ *flake8 . --count --select=E901,E999,F821,F822,F823 --show-source 
> --statistics*
> ./dev/merge_spark_pr.py:98:14: F821 undefined name 'raw_input'
> result = raw_input("\n%s (y/n): " % prompt)
>  ^
> ./dev/merge_spark_pr.py:136:22: F821 undefined name 'raw_input'
> primary_author = raw_input(
>  ^
> ./dev/merge_spark_pr.py:186:16: F821 undefined name 'raw_input'
> pick_ref = raw_input("Enter a branch name [%s]: " % default_branch)
>^
> ./dev/merge_spark_pr.py:233:15: F821 undefined name 'raw_input'
> jira_id = raw_input("Enter a JIRA id [%s]: " % default_jira_id)
>   ^
> ./dev/merge_spark_pr.py:278:20: F821 undefined name 'raw_input'
> fix_versions = raw_input("Enter comma-separated fix version(s) [%s]: " % 
> default_fix_versions)
>^
> ./dev/merge_spark_pr.py:317:28: F821 undefined name 'raw_input'
> raw_assignee = raw_input(
>^
> ./dev/merge_spark_pr.py:430:14: F821 undefined name 'raw_input'
> pr_num = raw_input("Which pull request would you like to merge? (e.g. 
> 34): ")
>  ^
> ./dev/merge_spark_pr.py:442:18: F821 undefined name 'raw_input'
> result = raw_input("Would you like to use the modified title? (y/n): 
> ")
>  ^
> ./dev/merge_spark_pr.py:493:11: F821 undefined name 'raw_input'
> while raw_input("\n%s (y/n): " % pick_prompt).lower() == "y":
>   ^
> ./dev/create-release/releaseutils.py:58:16: F821 undefined name 'raw_input'
> response = raw_input("%s [y/n]: " % msg)
>^
> ./dev/create-release/releaseutils.py:152:38: F821 undefined name 'unicode'
> author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
>  ^
> ./python/setup.py:37:11: F821 undefined name '__version__'
> VERSION = __version__
>   ^
> ./python/pyspark/cloudpickle.py:275:18: F821 undefined name 'buffer'
> dispatch[buffer] = save_buffer
>  ^
> ./python/pyspark/cloudpickle.py:807:18: F821 undefined name 'file'
> dispatch[file] = save_file
>  ^
> ./python/pyspark/sql/conf.py:61:61: F821 undefined name 'unicode'
> if not isinstance(obj, str) and not isinstance(obj, unicode):
> ^
> ./python/pyspark/sql/streaming.py:25:21: F821 undefined name 'long'
> intlike = (int, long)
> ^
> ./python/pyspark/streaming/dstream.py:405:35: F821 undefined name 'long'
> return self._sc._jvm.Time(long(timestamp * 1000))
>   ^
> ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:21:10: F821 
> undefined name 'xrange'
> for i in xrange(50):
>  ^
> ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:22:14: F821 
> undefined name 'xrange'
> for j in xrange(5):
>  ^
> ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:23:18: F821 
> undefined name 'xrange'
> for k in xrange(20022):
>  ^
> 20F821 undefined name 'raw_input'
> 20



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org