date:20190421

[jira] [Assigned] (SPARK-27527) Improve description of Timestamp and Date types

2019-04-21 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-27527:


Assignee: Maxim Gekk

> Improve description of Timestamp and Date types
> ---
>
> Key: SPARK-27527
> URL: https://issues.apache.org/jira/browse/SPARK-27527
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Trivial
>
> Describe precisely semantic of TimestampType and DateType, how they represent 
> dates and timestamps internally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27527) Improve description of Timestamp and Date types

2019-04-21 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-27527.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24424
[https://github.com/apache/spark/pull/24424]

> Improve description of Timestamp and Date types
> ---
>
> Key: SPARK-27527
> URL: https://issues.apache.org/jira/browse/SPARK-27527
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Trivial
> Fix For: 3.0.0
>
>
> Describe precisely semantic of TimestampType and DateType, how they represent 
> dates and timestamps internally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27532) Correct the default value in the Documentation for "spark.redaction.regex"

2019-04-21 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-27532.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24428
[https://github.com/apache/spark/pull/24428]

> Correct the default value in the Documentation for "spark.redaction.regex"
> --
>
> Key: SPARK-27532
> URL: https://issues.apache.org/jira/browse/SPARK-27532
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Shivu Sondur
>Assignee: Shivu Sondur
>Priority: Minor
> Fix For: 3.0.0
>
>
> Correct the default value in the Documentation for "spark.redaction.regex".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27532) Correct the default value in the Documentation for "spark.redaction.regex"

2019-04-21 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-27532:


Assignee: Shivu Sondur

> Correct the default value in the Documentation for "spark.redaction.regex"
> --
>
> Key: SPARK-27532
> URL: https://issues.apache.org/jira/browse/SPARK-27532
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Shivu Sondur
>Assignee: Shivu Sondur
>Priority: Minor
>
> Correct the default value in the Documentation for "spark.redaction.regex".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27533) Date/timestamps CSV benchmarks

2019-04-21 Thread Maxim Gekk (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27533:
---
Summary: Date/timestamps CSV benchmarks  (was: CSV benchmarks 
date/timestamp ops )

> Date/timestamps CSV benchmarks
> --
>
> Key: SPARK-27533
> URL: https://issues.apache.org/jira/browse/SPARK-27533
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Maxim Gekk
>Priority: Minor
>
> Extend CSVBenchmark by new benchmarks:
> - Write dates/timestamps to files
> - Read/infer dates/timestamp from files
> - Read/infer dates/timestamps from Dataset[String]
> - to_csv/from_csv for dates/timestamps



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27533) CSV benchmarks date/timestamp ops

2019-04-21 Thread Maxim Gekk (JIRA)

Maxim Gekk created SPARK-27533:
--

 Summary: CSV benchmarks date/timestamp ops 
 Key: SPARK-27533
 URL: https://issues.apache.org/jira/browse/SPARK-27533
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.4.1
Reporter: Maxim Gekk


Extend CSVBenchmark by new benchmarks:
- Write dates/timestamps to files
- Read/infer dates/timestamp from files
- Read/infer dates/timestamps from Dataset[String]
- to_csv/from_csv for dates/timestamps




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27533) Date and timestamp CSV benchmarks

2019-04-21 Thread Maxim Gekk (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27533:
---
Summary: Date and timestamp CSV benchmarks  (was: Date/timestamps CSV 
benchmarks)

> Date and timestamp CSV benchmarks
> -
>
> Key: SPARK-27533
> URL: https://issues.apache.org/jira/browse/SPARK-27533
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Maxim Gekk
>Priority: Minor
>
> Extend CSVBenchmark by new benchmarks:
> - Write dates/timestamps to files
> - Read/infer dates/timestamp from files
> - Read/infer dates/timestamps from Dataset[String]
> - to_csv/from_csv for dates/timestamps



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27367) Faster RoaringBitmap Serialization with v0.8.0

2019-04-21 Thread Liang-Chi Hsieh (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822724#comment-16822724
 ] 

Liang-Chi Hsieh commented on SPARK-27367:
-

I changed spark code to use the new API when upgrading the new version of 
roaring bitmap.

The size of the bitmap is also related to spareness and distribution of empty 
blocks. I don't have real loading to produce big bitmap. So I manually created 
a HighlyCompressedMapStatus and benchmarked serializing/deserializing of the 
bitmap inside. I use a pretty big block sizes array to the 
HighlyCompressedMapStatus. I think we don't set such number of partitions 
(1) on the reduce side. With this bitmap, I can see a little 
performance difference (9ms v.s. 6ms) between old and new serde API.

{code}
val conf = new SparkConf(false)
conf.set(KRYO_REGISTRATION_REQUIRED, true)
val ser = new KryoSerializer(conf).newInstance()

val blockSizes = (0L until 1L).map { i =>
  if (i % 2 == 0) {
0L
  } else {
i
  }
}.toArray
val serialized = 
ser.serialize(HighlyCompressedMapStatus(BlockManagerId("exec-1", "host", 1234), 
blockSizes))
ser.deserialize(serialized)
{code}





> Faster RoaringBitmap Serialization with v0.8.0
> --
>
> Key: SPARK-27367
> URL: https://issues.apache.org/jira/browse/SPARK-27367
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Imran Rashid
>Priority: Major
>
> RoaringBitmap 0.8.0 adds faster serde, but also requires us to change how we 
> call the serde routines slightly to take advantage of it.  This is probably a 
> worthwhile optimization as the every shuffle map task with a large # of 
> partitions generates these bitmaps, and the driver especially has to 
> deserialize many of these messages.
> See 
> * https://github.com/apache/spark/pull/24264#issuecomment-479675572
> * https://github.com/RoaringBitmap/RoaringBitmap/pull/325
> * https://github.com/RoaringBitmap/RoaringBitmap/issues/319



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24601) Bump Jackson version to 2.9.6

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24601:
--
Fix Version/s: 2.4.3

> Bump Jackson version to 2.9.6
> -
>
> Key: SPARK-24601
> URL: https://issues.apache.org/jira/browse/SPARK-24601
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 3.0.0, 2.4.3
>
>
> The Jackson version is lacking behind, and therefore I have to add a lot of 
> exclusions to the SBT files: 
> ```
> Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible 
> Jackson version: 2.9.5
>   at 
> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
>   at 
> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala:82)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala)
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27051) Bump Jackson version to 2.9.8

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27051:
--
Fix Version/s: 2.4.3

> Bump Jackson version to 2.9.8
> -
>
> Key: SPARK-27051
> URL: https://issues.apache.org/jira/browse/SPARK-27051
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>Priority: Major
> Fix For: 3.0.0, 2.4.3
>
>
> Fasterxml Jackson version before 2.9.8 is affected by multiple CVEs 
> [[https://github.com/FasterXML/jackson-databind/issues/2186]], we need to fix 
> bump the dependent Jackson to 2.9.8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-24601) Bump Jackson version to 2.9.6

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-24601:
-

Assignee: Fokko Driesprong

> Bump Jackson version to 2.9.6
> -
>
> Key: SPARK-24601
> URL: https://issues.apache.org/jira/browse/SPARK-24601
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 3.0.0, 2.4.3
>
>
> The Jackson version is lacking behind, and therefore I have to add a lot of 
> exclusions to the SBT files: 
> ```
> Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible 
> Jackson version: 2.9.5
>   at 
> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
>   at 
> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala:82)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala)
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27439) Explain result should match collected result after view change

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27439:
--
Summary: Explain result should match collected result after view change  
(was: createOrReplaceTempView cannot update old dataset)

> Explain result should match collected result after view change
> --
>
> Key: SPARK-27439
> URL: https://issues.apache.org/jira/browse/SPARK-27439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: xjl
>Priority: Major
>
> {code:java}
> SparkSession spark = SparkSession
> .builder()
> .appName("app").enableHiveSupport().master("local[4]")
> .getOrCreate();
> spark.sql("select * from default.t1").createOrReplaceTempView("tmp001");
> Dataset hiveTable = spark.sql("select * from tmp001");
> spark.sql("select * from default.t2").createOrReplaceTempView("tmp001");
> hiveTable.show();
>
> }
> {code}
> hiveTable show the value of t1 but not t2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27439) Explain result should match collected result after view change

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27439:
--
Priority: Minor  (was: Major)

> Explain result should match collected result after view change
> --
>
> Key: SPARK-27439
> URL: https://issues.apache.org/jira/browse/SPARK-27439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: xjl
>Priority: Minor
>
> {code:java}
> SparkSession spark = SparkSession
> .builder()
> .appName("app").enableHiveSupport().master("local[4]")
> .getOrCreate();
> spark.sql("select * from default.t1").createOrReplaceTempView("tmp001");
> Dataset hiveTable = spark.sql("select * from tmp001");
> spark.sql("select * from default.t2").createOrReplaceTempView("tmp001");
> hiveTable.show();
>
> }
> {code}
> hiveTable show the value of t1 but not t2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27439) Use analyzed plan when explaining Dataset

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27439:
--
Summary: Use analyzed plan when explaining Dataset  (was: Explain result 
should match collected result after view change)

> Use analyzed plan when explaining Dataset
> -
>
> Key: SPARK-27439
> URL: https://issues.apache.org/jira/browse/SPARK-27439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: xjl
>Priority: Minor
>
> {code:java}
> SparkSession spark = SparkSession
> .builder()
> .appName("app").enableHiveSupport().master("local[4]")
> .getOrCreate();
> spark.sql("select * from default.t1").createOrReplaceTempView("tmp001");
> Dataset hiveTable = spark.sql("select * from tmp001");
> spark.sql("select * from default.t2").createOrReplaceTempView("tmp001");
> hiveTable.show();
>
> }
> {code}
> hiveTable show the value of t1 but not t2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27439) Use analyzed plan when explaining Dataset

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27439:
--
Issue Type: Improvement  (was: Bug)

> Use analyzed plan when explaining Dataset
> -
>
> Key: SPARK-27439
> URL: https://issues.apache.org/jira/browse/SPARK-27439
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: xjl
>Priority: Minor
>
> {code:java}
> SparkSession spark = SparkSession
> .builder()
> .appName("app").enableHiveSupport().master("local[4]")
> .getOrCreate();
> spark.sql("select * from default.t1").createOrReplaceTempView("tmp001");
> Dataset hiveTable = spark.sql("select * from tmp001");
> spark.sql("select * from default.t2").createOrReplaceTempView("tmp001");
> hiveTable.show();
>
> }
> {code}
> hiveTable show the value of t1 but not t2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27439) Use analyzed plan when explaining Dataset

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27439:
--
Description: 
{code}
scala> spark.range(10).createOrReplaceTempView("test")
scala> spark.range(5).createOrReplaceTempView("test2")
scala> spark.sql("select * from test").createOrReplaceTempView("tmp001")
scala> val df = spark.sql("select * from tmp001")
scala> spark.sql("select * from test2").createOrReplaceTempView("tmp001")
scala> df.show
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+
scala> df.explain
{code}

Before:
{code}
== Physical Plan ==
*(1) Range (0, 5, step=1, splits=12)
{code}

After:
{code}
== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)
{code}

  was:
{code:java}
SparkSession spark = SparkSession
.builder()
.appName("app").enableHiveSupport().master("local[4]")
.getOrCreate();

spark.sql("select * from default.t1").createOrReplaceTempView("tmp001");

Dataset hiveTable = spark.sql("select * from tmp001");
spark.sql("select * from default.t2").createOrReplaceTempView("tmp001");
hiveTable.show();
   
}
{code}
hiveTable show the value of t1 but not t2


> Use analyzed plan when explaining Dataset
> -
>
> Key: SPARK-27439
> URL: https://issues.apache.org/jira/browse/SPARK-27439
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: xjl
>Priority: Minor
>
> {code}
> scala> spark.range(10).createOrReplaceTempView("test")
> scala> spark.range(5).createOrReplaceTempView("test2")
> scala> spark.sql("select * from test").createOrReplaceTempView("tmp001")
> scala> val df = spark.sql("select * from tmp001")
> scala> spark.sql("select * from test2").createOrReplaceTempView("tmp001")
> scala> df.show
> +---+
> | id|
> +---+
> |  0|
> |  1|
> |  2|
> |  3|
> |  4|
> |  5|
> |  6|
> |  7|
> |  8|
> |  9|
> +---+
> scala> df.explain
> {code}
> Before:
> {code}
> == Physical Plan ==
> *(1) Range (0, 5, step=1, splits=12)
> {code}
> After:
> {code}
> == Physical Plan ==
> *(1) Range (0, 10, step=1, splits=12)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27439) Use analyzed plan when explaining Dataset

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27439.
---
   Resolution: Fixed
 Assignee: Liang-Chi Hsieh
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/24415

> Use analyzed plan when explaining Dataset
> -
>
> Key: SPARK-27439
> URL: https://issues.apache.org/jira/browse/SPARK-27439
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: xjl
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 3.0.0
>
>
> {code}
> scala> spark.range(10).createOrReplaceTempView("test")
> scala> spark.range(5).createOrReplaceTempView("test2")
> scala> spark.sql("select * from test").createOrReplaceTempView("tmp001")
> scala> val df = spark.sql("select * from tmp001")
> scala> spark.sql("select * from test2").createOrReplaceTempView("tmp001")
> scala> df.show
> +---+
> | id|
> +---+
> |  0|
> |  1|
> |  2|
> |  3|
> |  4|
> |  5|
> |  6|
> |  7|
> |  8|
> |  9|
> +---+
> scala> df.explain
> {code}
> Before:
> {code}
> == Physical Plan ==
> *(1) Range (0, 5, step=1, splits=12)
> {code}
> After:
> {code}
> == Physical Plan ==
> *(1) Range (0, 10, step=1, splits=12)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27529) Spark Streaming consumer dies with kafka.common.OffsetOutOfRangeException

2019-04-21 Thread Dmitry Goldenberg (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Goldenberg updated SPARK-27529:
--
Description: 
We have a Spark Streaming consumer which at a certain point started 
consistently failing upon a restart with the below error.

Some details:
 * Spark version is 1.5.0.
 * Kafka version is 0.8.2.1 (2.10-0.8.2.1).
 * The topic is configured with: retention.ms=1471228928, 
max.message.bytes=1.
 * The consumer runs with auto.offset.reset=smallest.
 * No checkpointing is currently enabled.

I don't see anything in the Spark or Kafka doc to understand why this is 
happening. From googling around,
{noformat}
https://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/

Finally, I’ll repeat that any semantics beyond at-most-once require that you 
have sufficient log retention in Kafka. If you’re seeing things like 
OffsetOutOfRangeException, it’s probably because you underprovisioned Kafka 
storage, not because something’s wrong with Spark or Kafka.{noformat}
Also looking at SPARK-12693 and SPARK-11693, I don't understand the possible 
causes.
{noformat}
You've under-provisioned Kafka storage and / or Spark compute capacity.
The result is that data is being deleted before it has been processed.{noformat}
All we're trying to do is start the consumer and consume from the topic from 
the earliest available offset. Why would we not be able to do that? How can the 
offsets be out of range if we're saying, just read from the earliest available?

Since we have the retention.ms set to 1 year and we created the topic just a 
few weeks ago, I'd not expect any deletion being done by Kafka as we're 
consuming.

I'd like to understand the actual cause of this error. Any recommendations on a 
workaround would be appreciated.

Stack traces:
{noformat}
2019-04-19 11:35:17,147 ERROR org.apache.spark.scheduler

.TaskSetManager: Task 10 in stage 147.0 failed 4 times; aborting job

2019-04-19 11:35:17,160 ERROR 
org.apache.spark.streaming.scheduler.JobScheduler: Error running job streaming 
job 1555682554000 ms.0

org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in 
stage 147.0 failed 4 times, most recent failure: Lost task

10.3 in stage 147.0 (TID 2368, 10.150.0.58): 
kafka.common.OffsetOutOfRangeException

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

at java.lang.Class.newInstance(Class.java:442)

at kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:86)

at 
org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.handleFetchErr(KafkaRDD.scala:184)

at 
org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.fetchBatch(KafkaRDD.scala:193)

at 
org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.getNext(KafkaRDD.scala:208)

at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)

at 
com.acme.consumer.kafka.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:69)

at 
com.acme.consumer.kafka.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:24)

at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222)

at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222)

at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:898)

at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:898)

at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)

at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:88)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)




Driver stacktrace:

at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.sca

la:1280) ~[spark-assembly-1.5.0-hadoop2.4.0.jar:1.5.0]

at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
 ~[spark-assembly-1.5.0-hadoop2.4

.0.jar:1.5.0]

at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
 ~

[jira] [Resolved] (SPARK-27473) Support filter push down for status fields in binary file data source

2019-04-21 Thread Xiangrui Meng (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-27473.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24387
[https://github.com/apache/spark/pull/24387]

> Support filter push down for status fields in binary file data source
> -
>
> Key: SPARK-27473
> URL: https://issues.apache.org/jira/browse/SPARK-27473
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Weichen Xu
>Priority: Major
> Fix For: 3.0.0
>
>
> As a user, I can use 
> `spark.read.format("binaryFile").load(path).filter($"status.lenght" < 
> 1L)` to load files that are less than 1e8 bytes. Spark shouldn't even 
> read files that are bigger than 1e8 bytes in this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-25348) Data source for binary files

2019-04-21 Thread Xiangrui Meng (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818640#comment-16818640
 ] 

Xiangrui Meng edited comment on SPARK-25348 at 4/21/19 7:49 PM:


I created follow-up tasks:
* Documentation: SPARK-27472
* Filter push down: SPARK-27473
* Content column pruning: SPARK-27534


was (Author: mengxr):
I created two follow-up tasks:
* Documentation: SPARK-27472
* Filter push down: SPARK-27473

> Data source for binary files
> 
>
> Key: SPARK-25348
> URL: https://issues.apache.org/jira/browse/SPARK-25348
> Project: Spark
>  Issue Type: Story
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Weichen Xu
>Priority: Major
> Fix For: 3.0.0
>
>
> It would be useful to have a data source implementation for binary files, 
> which can be used to build features to load images, audio, and videos.
> Microsoft has an implementation at 
> [https://github.com/Azure/mmlspark/tree/master/src/io/binary.] It would be 
> great if we can merge it into Spark main repo.
> cc: [~mhamilton] and [~imatiach]
> Proposed API:
> Format name: "binaryFile"
> Schema:
> * content: BinaryType
> * status (following Hadoop FIleStatus):
> ** path: StringType
> ** modificationTime: Timestamp
> ** length: LongType (size limit 2GB)
> Options:
> * pathGlobFilter: only include files with path matching the glob pattern
> Input partition size can be controlled by common SQL confs: maxPartitionBytes 
> and openCostInBytes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27534) Do not load `content` column in binary data source if it is not selected

2019-04-21 Thread Xiangrui Meng (JIRA)

Xiangrui Meng created SPARK-27534:
-

 Summary: Do not load `content` column in binary data source if it 
is not selected
 Key: SPARK-27534
 URL: https://issues.apache.org/jira/browse/SPARK-27534
 Project: Spark
  Issue Type: Story
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xiangrui Meng


A follow-up task from SPARK-25348. To save I/O cost, Spark shouldn't attempt to 
read the file if users didn't request the `content` column. For example:

{code}
spark.read.format("binaryFile").load(path).filter($"length" < 100).count()
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27535) Date and timestamp JSON benchmarks

2019-04-21 Thread Maxim Gekk (JIRA)

Maxim Gekk created SPARK-27535:
--

 Summary: Date and timestamp JSON benchmarks
 Key: SPARK-27535
 URL: https://issues.apache.org/jira/browse/SPARK-27535
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.4.1
Reporter: Maxim Gekk


Extend JSONBenchmark by new benchmarks:
* Write dates/timestamps to files
* Read/infer dates/timestamp from files
* Read/infer dates/timestamps from Dataset[String]
* to_json/from_json for dates/timestamps




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27287) PCAModel.load() does not honor spark configs

2019-04-21 Thread Dharmesh Kakadia (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822759#comment-16822759
 ] 

Dharmesh Kakadia commented on SPARK-27287:
--

[~mgaido] what do you mean by : use the sparkSession when reading ML models ?

 

Also, for whats its worth, if I use the following to set the config, the same  
PCAModel.load() call works. 

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.test.blob.core.windows.net","Xosad==")

 

> PCAModel.load() does not honor spark configs
> 
>
> Key: SPARK-27287
> URL: https://issues.apache.org/jira/browse/SPARK-27287
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Dharmesh Kakadia
>Priority: Major
>
> PCAModel.load() does not seem to be using the configurations set on the 
> current spark session. 
> Repro:
>  
> The following will fail to read the data because the storage account 
> credentials config used/propagated. 
> conf.set("fs.azure.account.key.test.blob.core.windows.net","Xosad==")
> spark = 
> SparkSession.builder.appName("dharmesh").config(conf=conf).master('spark://spark-master:7077').getOrCreate()
> model = PCAModel.load('wasb://t...@test.blob.core.windows.net/model')
>  
> The following however works:
> conf.set("fs.azure.account.key.test.blob.core.windows.net","Xosad==")
> spark = 
> SparkSession.builder.appName("dharmesh").config(conf=conf).master('spark://spark-master:7077').getOrCreate()
> blah = 
> spark.read.json('wasb://t...@test.blob.core.windows.net/somethingelse/')
> blah.show()
> model = PCAModel.load('wasb://t...@test.blob.core.windows.net/model')
>  
> It looks like spark.read...() does force the use of the config once and then 
> PCAModel.load() will work correctly. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27274) Refer to Scala 2.12 in docs; deprecate Scala 2.11 support in 2.4.1

2019-04-21 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-27274:
--
Labels: release-notes  (was: )

> Refer to Scala 2.12 in docs; deprecate Scala 2.11 support in 2.4.1
> --
>
> Key: SPARK-27274
> URL: https://issues.apache.org/jira/browse/SPARK-27274
> Project: Spark
>  Issue Type: Task
>  Components: Documentation, Spark Core
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>  Labels: release-notes
> Fix For: 2.4.1
>
>
> Scala 2.11 support is removed in Spark 3.0, so we should at least call it 
> deprecated in 2.4.x.
> The 2.4.x docs current refer to Scala 2.11 artifacts. As 2.12 has been 
> supported since 2.4.0 without any significant issues, we should refer to 2.12 
> artifacts in the docs by default as well.
> You could say this implicitly declares it 'unexperimental', if it ever was 
> deemed experimental, as we'd certainly support 2.12 and not change that 
> support for the foreseeable future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24601) Bump Jackson version to 2.9.6

2019-04-21 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-24601:
--
Labels: release-notes  (was: )

> Bump Jackson version to 2.9.6
> -
>
> Key: SPARK-24601
> URL: https://issues.apache.org/jira/browse/SPARK-24601
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0, 2.4.3
>
>
> The Jackson version is lacking behind, and therefore I have to add a lot of 
> exclusions to the SBT files: 
> ```
> Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible 
> Jackson version: 2.9.5
>   at 
> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
>   at 
> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala:82)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala)
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24601) Bump Jackson version to 2.9.6

2019-04-21 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-24601:
--
Docs Text: Spark's Jackson dependency has been updated from 2.6.x to 2.9.x. 
User applications that inherit Spark's Jackson version should note that various 
Jackson behaviors changed between these releases.

> Bump Jackson version to 2.9.6
> -
>
> Key: SPARK-24601
> URL: https://issues.apache.org/jira/browse/SPARK-24601
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0, 2.4.3
>
>
> The Jackson version is lacking behind, and therefore I have to add a lot of 
> exclusions to the SBT files: 
> ```
> Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible 
> Jackson version: 2.9.5
>   at 
> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
>   at 
> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala:82)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala)
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27419) When setting spark.executor.heartbeatInterval to a value less than 1 seconds, it will always fail

2019-04-21 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-27419:
--
Labels: release-notes  (was: )

> When setting spark.executor.heartbeatInterval to a value less than 1 seconds, 
> it will always fail
> -
>
> Key: SPARK-27419
> URL: https://issues.apache.org/jira/browse/SPARK-27419
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.2
>
>
> When setting spark.executor.heartbeatInterval to a value less than 1 seconds 
> in branch-2.4, it will always fail because the value will be converted to 0 
> and the heartbeat will always timeout and finally kill the executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27480) Improve explain output of describe query command to show the actual input query as opposed to a truncated logical plan.

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27480:
--
Issue Type: Improvement  (was: Bug)

> Improve explain output of describe query command to show the actual input 
> query as opposed to a truncated logical plan.
> ---
>
> Key: SPARK-27480
> URL: https://issues.apache.org/jira/browse/SPARK-27480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Dilip Biswal
>Priority: Minor
>
> Currently running explain on describe query gives a little confusing output. 
> Instead of showing the actual query that is input by the user, it shows the 
> truncated logical plan as the input. We should improve it to show the query 
> text as input by user.
> Here are the sample outputs of the explain command.
>  
> {code:java}
> EXPLAIN DESCRIBE WITH s AS (SELECT 'hello' as col1) SELECT * FROM s;
> == Physical Plan ==
> Execute DescribeQueryCommand
>+- DescribeQueryCommand CTE [s]
> {code}
> {code:java}
> EXPLAIN EXTENDED DESCRIBE SELECT * from s1 where c1 > 0;
> == Physical Plan ==
> Execute DescribeQueryCommand
>+- DescribeQueryCommand 'Project [*]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27480) Improve `EXPLAIN DESC QUERY` to show the input SQL statement

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27480.
---
   Resolution: Fixed
 Assignee: Dilip Biswal
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/24385

> Improve `EXPLAIN DESC QUERY` to show the input SQL statement
> 
>
> Key: SPARK-27480
> URL: https://issues.apache.org/jira/browse/SPARK-27480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently running explain on describe query gives a little confusing output. 
> Instead of showing the actual query that is input by the user, it shows the 
> truncated logical plan as the input. We should improve it to show the query 
> text as input by user.
> Here are the sample outputs of the explain command.
>  
> {code:java}
> EXPLAIN DESCRIBE WITH s AS (SELECT 'hello' as col1) SELECT * FROM s;
> == Physical Plan ==
> Execute DescribeQueryCommand
>+- DescribeQueryCommand CTE [s]
> {code}
> {code:java}
> EXPLAIN EXTENDED DESCRIBE SELECT * from s1 where c1 > 0;
> == Physical Plan ==
> Execute DescribeQueryCommand
>+- DescribeQueryCommand 'Project [*]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27480) Improve `EXPLAIN DESC QUERY` to show the input SQL statement

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27480:
--
Summary: Improve `EXPLAIN DESC QUERY` to show the input SQL statement  
(was: Improve explain output of describe query command to show the actual input 
query as opposed to a truncated logical plan.)

> Improve `EXPLAIN DESC QUERY` to show the input SQL statement
> 
>
> Key: SPARK-27480
> URL: https://issues.apache.org/jira/browse/SPARK-27480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Dilip Biswal
>Priority: Minor
>
> Currently running explain on describe query gives a little confusing output. 
> Instead of showing the actual query that is input by the user, it shows the 
> truncated logical plan as the input. We should improve it to show the query 
> text as input by user.
> Here are the sample outputs of the explain command.
>  
> {code:java}
> EXPLAIN DESCRIBE WITH s AS (SELECT 'hello' as col1) SELECT * FROM s;
> == Physical Plan ==
> Execute DescribeQueryCommand
>+- DescribeQueryCommand CTE [s]
> {code}
> {code:java}
> EXPLAIN EXTENDED DESCRIBE SELECT * from s1 where c1 > 0;
> == Physical Plan ==
> Execute DescribeQueryCommand
>+- DescribeQueryCommand 'Project [*]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27513) Spark tarball with binaries should have files owned by uid 0

2019-04-21 Thread koert kuipers (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822789#comment-16822789
 ] 

koert kuipers commented on SPARK-27513:
---

i think this can be closed as wont fix

> Spark tarball with binaries should have files owned by uid 0
> 
>
> Key: SPARK-27513
> URL: https://issues.apache.org/jira/browse/SPARK-27513
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.1
>Reporter: koert kuipers
>Priority: Minor
> Fix For: 3.0.0
>
>
> currently the tarball is created in dev/make-distribution.sh like this:
> {code:bash}
> tar czf "spark-$VERSION-bin-$NAME.tgz" -C "$SPARK_HOME" "$TARDIR_NAME"
> {code}
> the problem with this is that if root unpacks this tarball the files are 
> owned by whatever the uid is of the person that created the tarball. this uid 
> probably doesnt exist or belongs to a different unrelated user. this is 
> unlikely to be what anyone wants.
> for other users this problem doesnt exist since tar is now allowed to change 
> uid. so when they unpack the tarball the files are owned by them.
> it is more typical to set the uid and gid to 0 for a tarball. that way when 
> root unpacks it the files are owned by root. so like this:
> {code:bash}
> tar czf "spark-$VERSION-bin-$NAME.tgz" --numeric-owner --owner=0 --group=0 -C 
> "$SPARK_HOME" "$TARDIR_NAME"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27512) Decimal parsing leads to unexpected type inference

2019-04-21 Thread koert kuipers (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822790#comment-16822790
 ] 

koert kuipers commented on SPARK-27512:
---

[~maxgekk] max do you know why getDecimalParser has that if condition for 
Locale US where it calls
{code} (s: String) => new java.math.BigDecimal(s.replaceAll(",", "")) {code}

i think it's that {code}s.replaceAll(",", ""){code} that is causing my issues.
i  saw it was introduced in:
{code}
commit 7a83d71403edf7d24fa5efc0ef913f3ce76d88b8
Author: Maxim Gekk 
Date:   Thu Nov 29 22:15:12 2018 +0800

[SPARK-26163][SQL] Parsing decimals from JSON using locale
{code}

> Decimal parsing leads to unexpected type inference
> --
>
> Key: SPARK-27512
> URL: https://issues.apache.org/jira/browse/SPARK-27512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: spark 3.0.0-SNAPSHOT from this commit:
> {code:bash}
> commit 3ab96d7acf870e53c9016b0b63d0b328eec23bed
> Author: Dilip Biswal 
> Date:   Mon Apr 15 21:26:45 2019 +0800
> {code}
>Reporter: koert kuipers
>Priority: Minor
>
> {code:bash}
> $ hadoop fs -text test.bsv
> x|y
> 1|1,2
> 2|2,3
> 3|3,4
> {code}
> in spark 2.4.1:
> {code:bash}
> scala> val data = spark.read.format("csv").option("header", 
> true).option("delimiter", "|").option("inferSchema", true).load("test.bsv")
> scala> data.printSchema
> root
>  |-- x: integer (nullable = true)
>  |-- y: string (nullable = true)
> scala> data.show
> +---+---+
> |  x|  y|
> +---+---+
> |  1|1,2|
> |  2|2,3|
> |  3|3,4|
> +---+---+
> {code}
> in spark 3.0.0-SNAPSHOT:
> {code:bash}
> scala> val data = spark.read.format("csv").option("header", 
> true).option("delimiter", "|").option("inferSchema", true).load("test.bsv")
> scala> data.printSchema
> root
>  |-- x: integer (nullable = true)
>  |-- y: decimal(2,0) (nullable = true)
> scala> data.show
> +---+---+
> |  x|  y|
> +---+---+
> |  1| 12|
> |  2| 23|
> |  3| 34|
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27512) Decimal parsing leads to unexpected type inference

2019-04-21 Thread koert kuipers (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822790#comment-16822790
 ] 

koert kuipers edited comment on SPARK-27512 at 4/21/19 11:03 PM:
-

[~maxgekk] maxim do you know why getDecimalParser has that if condition for 
Locale US where it calls
{code:java}
 (s: String) => new java.math.BigDecimal(s.replaceAll(",", "")) {code}
i think it's that
{code:java}
s.replaceAll(",", ""){code}
that is causing my issues.
 i saw it was introduced in:
{code:java}
commit 7a83d71403edf7d24fa5efc0ef913f3ce76d88b8
Author: Maxim Gekk 
Date:   Thu Nov 29 22:15:12 2018 +0800

[SPARK-26163][SQL] Parsing decimals from JSON using locale
{code}


was (Author: koert):
[~maxgekk] max do you know why getDecimalParser has that if condition for 
Locale US where it calls
{code} (s: String) => new java.math.BigDecimal(s.replaceAll(",", "")) {code}

i think it's that {code}s.replaceAll(",", ""){code} that is causing my issues.
i  saw it was introduced in:
{code}
commit 7a83d71403edf7d24fa5efc0ef913f3ce76d88b8
Author: Maxim Gekk 
Date:   Thu Nov 29 22:15:12 2018 +0800

[SPARK-26163][SQL] Parsing decimals from JSON using locale
{code}

> Decimal parsing leads to unexpected type inference
> --
>
> Key: SPARK-27512
> URL: https://issues.apache.org/jira/browse/SPARK-27512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: spark 3.0.0-SNAPSHOT from this commit:
> {code:bash}
> commit 3ab96d7acf870e53c9016b0b63d0b328eec23bed
> Author: Dilip Biswal 
> Date:   Mon Apr 15 21:26:45 2019 +0800
> {code}
>Reporter: koert kuipers
>Priority: Minor
>
> {code:bash}
> $ hadoop fs -text test.bsv
> x|y
> 1|1,2
> 2|2,3
> 3|3,4
> {code}
> in spark 2.4.1:
> {code:bash}
> scala> val data = spark.read.format("csv").option("header", 
> true).option("delimiter", "|").option("inferSchema", true).load("test.bsv")
> scala> data.printSchema
> root
>  |-- x: integer (nullable = true)
>  |-- y: string (nullable = true)
> scala> data.show
> +---+---+
> |  x|  y|
> +---+---+
> |  1|1,2|
> |  2|2,3|
> |  3|3,4|
> +---+---+
> {code}
> in spark 3.0.0-SNAPSHOT:
> {code:bash}
> scala> val data = spark.read.format("csv").option("header", 
> true).option("delimiter", "|").option("inferSchema", true).load("test.bsv")
> scala> data.printSchema
> root
>  |-- x: integer (nullable = true)
>  |-- y: decimal(2,0) (nullable = true)
> scala> data.show
> +---+---+
> |  x|  y|
> +---+---+
> |  1| 12|
> |  2| 23|
> |  3| 34|
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27512) Decimal parsing leads to unexpected type inference

2019-04-21 Thread koert kuipers (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822796#comment-16822796
 ] 

koert kuipers commented on SPARK-27512:
---

seems DecimalFormat.parse also simply ignores commas. still unclear to me why 
we have to do same in the "special handling of default locale for backwards 
compatibility" but i am guessing that has to do with json parsing backwards 
compatibility, not csv backwards compatibility.

> Decimal parsing leads to unexpected type inference
> --
>
> Key: SPARK-27512
> URL: https://issues.apache.org/jira/browse/SPARK-27512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: spark 3.0.0-SNAPSHOT from this commit:
> {code:bash}
> commit 3ab96d7acf870e53c9016b0b63d0b328eec23bed
> Author: Dilip Biswal 
> Date:   Mon Apr 15 21:26:45 2019 +0800
> {code}
>Reporter: koert kuipers
>Priority: Minor
>
> {code:bash}
> $ hadoop fs -text test.bsv
> x|y
> 1|1,2
> 2|2,3
> 3|3,4
> {code}
> in spark 2.4.1:
> {code:bash}
> scala> val data = spark.read.format("csv").option("header", 
> true).option("delimiter", "|").option("inferSchema", true).load("test.bsv")
> scala> data.printSchema
> root
>  |-- x: integer (nullable = true)
>  |-- y: string (nullable = true)
> scala> data.show
> +---+---+
> |  x|  y|
> +---+---+
> |  1|1,2|
> |  2|2,3|
> |  3|3,4|
> +---+---+
> {code}
> in spark 3.0.0-SNAPSHOT:
> {code:bash}
> scala> val data = spark.read.format("csv").option("header", 
> true).option("delimiter", "|").option("inferSchema", true).load("test.bsv")
> scala> data.printSchema
> root
>  |-- x: integer (nullable = true)
>  |-- y: decimal(2,0) (nullable = true)
> scala> data.show
> +---+---+
> |  x|  y|
> +---+---+
> |  1| 12|
> |  2| 23|
> |  3| 34|
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27496) RPC should send back the fatal errors

2019-04-21 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27496.
---
   Resolution: Fixed
Fix Version/s: 2.4.3
   3.0.0
   2.3.4

This is resolved via https://github.com/apache/spark/pull/24396

> RPC should send back the fatal errors
> -
>
> Key: SPARK-27496
> URL: https://issues.apache.org/jira/browse/SPARK-27496
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.1
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 2.3.4, 3.0.0, 2.4.3
>
>
> Right now, when a fatal error throws from "receiveAndReply", the sender will 
> not be notified. We should try our best to send it back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27536) Code improvements for 3.0: existentials edition

2019-04-21 Thread Sean Owen (JIRA)

Sean Owen created SPARK-27536:
-

 Summary: Code improvements for 3.0: existentials edition
 Key: SPARK-27536
 URL: https://issues.apache.org/jira/browse/SPARK-27536
 Project: Spark
  Issue Type: Improvement
  Components: ML, Spark Core, SQL, Structured Streaming
Affects Versions: 3.0.0
Reporter: Sean Owen
Assignee: Sean Owen


The Spark code base makes use of 'existential types' in Scala, a language 
feature which is quasi-deprecated -- it generates a warning unless 
scala.language.existentials is imported, and there is talk of removing it from 
future Scala versions: 
https://contributors.scala-lang.org/t/proposal-to-remove-existential-types-from-the-language/2785

We can get rid of most usages of this feature with lots of minor changes to the 
code. A PR is coming to demonstrate what's involved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27537) spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object

2019-04-21 Thread dingwei2019 (JIRA)

dingwei2019 created SPARK-27537:
---

 Summary: 
spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
 size is not a member of Object
 Key: SPARK-27537
 URL: https://issues.apache.org/jira/browse/SPARK-27537
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 2.4.1, 2.3.0
 Environment: Machine：aarch64

OS：Red Hat Enterprise Linux Server release 7.4

Kernel：4.11.0-44.el7a

spark version： spark-2.4.1

java：openjdk version "11.0.2" 2019-01-15

          OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9)

scala：2.11.12

gcc version：4.8.5
Reporter: dingwei2019


[ERROR]: [Error] 
$SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
 size is not a member of Object

[ERROR]: [Error] 
$SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value
 size is not a member of Object

ERROR: two errors found

Below is the related code:
856   test("toString") {
857 val empty = Matrices.ones(0, 0)
858 empty.toString(0, 0)
859 
860 val mat = Matrices.rand(5, 10, new Random())
861 mat.toString(-1, -5)
862 mat.toString(0, 0)
863 mat.toString(Int.MinValue, Int.MinValue)
864 mat.toString(Int.MaxValue, Int.MaxValue)
865 var lines = mat.toString(6, 50).lines.toArray
866 assert(lines.size == 5 && lines.forall(_.size <= 50))
867 
868 lines = mat.toString(5, 100).lines.toArray
869 assert(lines.size == 5 && lines.forall(_.size <= 100))
870   }
871 
872   test("numNonzeros and numActives") {
873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1))
874 assert(dm1.numNonzeros === 3)
875 assert(dm1.numActives === 6)
876 
877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), 
Array(0.0, -1.2, 0.0))
878 assert(sm1.numNonzeros === 1)
879 assert(sm1.numActives === 3)






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27537) spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object

2019-04-21 Thread dingwei2019 (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dingwei2019 updated SPARK-27537:

Docs Text:   (was: the question is found in spark ml test module, althrough 
this is an test module, i want to figure it out.
from the describe above, it seems an incompatible problem between java 11 and 
scala 2.11.12.
if I change my jdk to jdk8, and there is no problem.
Below is my analysis:

it seems in spark if  a method has implementation in java, spark will use java 
method, or will use scala method.
 'string' class in java11 adds the lines method. This method conflicts with the 
scala syntax.

scala has lines method in 'stringlike' class, the method return an Iterator;
Iterator in scala has a toArray method, the method return an Array;
the class array in scala has a size method. so if spark use scala method, it 
will have no problem.
lines(Iterator)-->toArray(Array)-->size

But Java11 adds lines method in 'string', this will return a Stream;
Stream in java11 has toArray method, and will return Object;
Object has no 'size' method. This is what the error says.
(Stream)-->(Object)toArray-->has no size method.)

> spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
>  size is not a member of Object
> -
>
> Key: SPARK-27537
> URL: https://issues.apache.org/jira/browse/SPARK-27537
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.3.0, 2.4.1
> Environment: Machine：aarch64
> OS：Red Hat Enterprise Linux Server release 7.4
> Kernel：4.11.0-44.el7a
> spark version： spark-2.4.1
> java：openjdk version "11.0.2" 2019-01-15
>           OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9)
> scala：2.11.12
> gcc version：4.8.5
>Reporter: dingwei2019
>Priority: Major
>  Labels: build, test
>
> [ERROR]: [Error] 
> $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
>  size is not a member of Object
> [ERROR]: [Error] 
> $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value
>  size is not a member of Object
> ERROR: two errors found
> Below is the related code:
> 856   test("toString") {
> 857 val empty = Matrices.ones(0, 0)
> 858 empty.toString(0, 0)
> 859 
> 860 val mat = Matrices.rand(5, 10, new Random())
> 861 mat.toString(-1, -5)
> 862 mat.toString(0, 0)
> 863 mat.toString(Int.MinValue, Int.MinValue)
> 864 mat.toString(Int.MaxValue, Int.MaxValue)
> 865 var lines = mat.toString(6, 50).lines.toArray
> 866 assert(lines.size == 5 && lines.forall(_.size <= 50))
> 867 
> 868 lines = mat.toString(5, 100).lines.toArray
> 869 assert(lines.size == 5 && lines.forall(_.size <= 100))
> 870   }
> 871 
> 872   test("numNonzeros and numActives") {
> 873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1))
> 874 assert(dm1.numNonzeros === 3)
> 875 assert(dm1.numActives === 6)
> 876 
> 877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), 
> Array(0.0, -1.2, 0.0))
> 878 assert(sm1.numNonzeros === 1)
> 879 assert(sm1.numActives === 3)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27537) spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object

2019-04-21 Thread dingwei2019 (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822845#comment-16822845
 ] 

dingwei2019 commented on SPARK-27537:
-

the question is found in spark ml test module, althrough this is an test 
module, i want to figure it out.
from the describe above, it seems an incompatible problem between java 11 and 
scala 2.11.12.

if I change my jdk to jdk8, and there is no problem.

Below is my analysis:

it seems in spark if  a method has implementation in java, spark will use java 
method, or will use scala method.
 'string' class in java11 adds the lines method. This method conflicts with the 
scala syntax.

scala has lines method in 'stringlike' class, the method return an Iterator;
Iterator in scala has a toArray method, the method return an Array;
the class array in scala has a size method. so if spark use scala method, it 
will have no problem.
lines(Iterator)-->toArray(Array)-->size

But Java11 adds lines method in 'string', this will return a Stream;
Stream in java11 has toArray method, and will return Object;
Object has no 'size' method. This is what the error says.
(Stream)-->(Object)toArray-->has no size method.


> spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
>  size is not a member of Object
> -
>
> Key: SPARK-27537
> URL: https://issues.apache.org/jira/browse/SPARK-27537
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.3.0, 2.4.1
> Environment: Machine：aarch64
> OS：Red Hat Enterprise Linux Server release 7.4
> Kernel：4.11.0-44.el7a
> spark version： spark-2.4.1
> java：openjdk version "11.0.2" 2019-01-15
>           OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9)
> scala：2.11.12
> gcc version：4.8.5
>Reporter: dingwei2019
>Priority: Major
>  Labels: build, test
>
> [ERROR]: [Error] 
> $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
>  size is not a member of Object
> [ERROR]: [Error] 
> $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value
>  size is not a member of Object
> ERROR: two errors found
> Below is the related code:
> 856   test("toString") {
> 857 val empty = Matrices.ones(0, 0)
> 858 empty.toString(0, 0)
> 859 
> 860 val mat = Matrices.rand(5, 10, new Random())
> 861 mat.toString(-1, -5)
> 862 mat.toString(0, 0)
> 863 mat.toString(Int.MinValue, Int.MinValue)
> 864 mat.toString(Int.MaxValue, Int.MaxValue)
> 865 var lines = mat.toString(6, 50).lines.toArray
> 866 assert(lines.size == 5 && lines.forall(_.size <= 50))
> 867 
> 868 lines = mat.toString(5, 100).lines.toArray
> 869 assert(lines.size == 5 && lines.forall(_.size <= 100))
> 870   }
> 871 
> 872   test("numNonzeros and numActives") {
> 873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1))
> 874 assert(dm1.numNonzeros === 3)
> 875 assert(dm1.numActives === 6)
> 876 
> 877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), 
> Array(0.0, -1.2, 0.0))
> 878 assert(sm1.numNonzeros === 1)
> 879 assert(sm1.numActives === 3)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27537) spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object

2019-04-21 Thread dingwei2019 (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822845#comment-16822845
 ] 

dingwei2019 edited comment on SPARK-27537 at 4/22/19 3:15 AM:
--

the question is found in spark ml test module, althrough this is an test 
module, i want to figure it out.
from the describe above, it seems an incompatible problem between java 11 and 
scala 2.11.12.

if I change my jdk to jdk8, and there is no problem.

Below is my analysis:

it seems in spark if  a method has implementation in java, spark will use java 
method, or will use scala method.
 'string' class in java11 adds the lines method. This method conflicts with the 
scala syntax.

scala has lines method in 'stringlike' class, the method return an Iterator;
Iterator in scala has a toArray method, the method return an Array;
the class array in scala has a size method. so if spark use scala method, it 
will have no problem.
lines(Iterator)-->toArray(Array)-->size

But Java11 adds lines method in 'string', this will return a Stream;
Stream in java11 has toArray method, and will return Object;
Object has no 'size' method. This is what the error says.
(Stream)-->(Object)toArray-->has no size method.

what shall i do to solve this problem.


was (Author: dingwei2019):
the question is found in spark ml test module, althrough this is an test 
module, i want to figure it out.
from the describe above, it seems an incompatible problem between java 11 and 
scala 2.11.12.

if I change my jdk to jdk8, and there is no problem.

Below is my analysis:

it seems in spark if  a method has implementation in java, spark will use java 
method, or will use scala method.
 'string' class in java11 adds the lines method. This method conflicts with the 
scala syntax.

scala has lines method in 'stringlike' class, the method return an Iterator;
Iterator in scala has a toArray method, the method return an Array;
the class array in scala has a size method. so if spark use scala method, it 
will have no problem.
lines(Iterator)-->toArray(Array)-->size

But Java11 adds lines method in 'string', this will return a Stream;
Stream in java11 has toArray method, and will return Object;
Object has no 'size' method. This is what the error says.
(Stream)-->(Object)toArray-->has no size method.


> spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
>  size is not a member of Object
> -
>
> Key: SPARK-27537
> URL: https://issues.apache.org/jira/browse/SPARK-27537
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.3.0, 2.4.1
> Environment: Machine：aarch64
> OS：Red Hat Enterprise Linux Server release 7.4
> Kernel：4.11.0-44.el7a
> spark version： spark-2.4.1
> java：openjdk version "11.0.2" 2019-01-15
>           OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9)
> scala：2.11.12
> gcc version：4.8.5
>Reporter: dingwei2019
>Priority: Major
>  Labels: build, test
>
> [ERROR]: [Error] 
> $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
>  size is not a member of Object
> [ERROR]: [Error] 
> $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value
>  size is not a member of Object
> ERROR: two errors found
> Below is the related code:
> 856   test("toString") {
> 857 val empty = Matrices.ones(0, 0)
> 858 empty.toString(0, 0)
> 859 
> 860 val mat = Matrices.rand(5, 10, new Random())
> 861 mat.toString(-1, -5)
> 862 mat.toString(0, 0)
> 863 mat.toString(Int.MinValue, Int.MinValue)
> 864 mat.toString(Int.MaxValue, Int.MaxValue)
> 865 var lines = mat.toString(6, 50).lines.toArray
> 866 assert(lines.size == 5 && lines.forall(_.size <= 50))
> 867 
> 868 lines = mat.toString(5, 100).lines.toArray
> 869 assert(lines.size == 5 && lines.forall(_.size <= 100))
> 870   }
> 871 
> 872   test("numNonzeros and numActives") {
> 873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1))
> 874 assert(dm1.numNonzeros === 3)
> 875 assert(dm1.numActives === 6)
> 876 
> 877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), 
> Array(0.0, -1.2, 0.0))
> 878 assert(sm1.numNonzeros === 1)
> 879 assert(sm1.numActives === 3)
> what shall i do to solve this problem, and when will spark support jdk11?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27537) spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object

2019-04-21 Thread dingwei2019 (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dingwei2019 updated SPARK-27537:

Description: 
[ERROR]: [Error] 
$SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
 size is not a member of Object

[ERROR]: [Error] 
$SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value
 size is not a member of Object

ERROR: two errors found

Below is the related code:
856   test("toString") {
857 val empty = Matrices.ones(0, 0)
858 empty.toString(0, 0)
859 
860 val mat = Matrices.rand(5, 10, new Random())
861 mat.toString(-1, -5)
862 mat.toString(0, 0)
863 mat.toString(Int.MinValue, Int.MinValue)
864 mat.toString(Int.MaxValue, Int.MaxValue)
865 var lines = mat.toString(6, 50).lines.toArray
866 assert(lines.size == 5 && lines.forall(_.size <= 50))
867 
868 lines = mat.toString(5, 100).lines.toArray
869 assert(lines.size == 5 && lines.forall(_.size <= 100))
870   }
871 
872   test("numNonzeros and numActives") {
873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1))
874 assert(dm1.numNonzeros === 3)
875 assert(dm1.numActives === 6)
876 
877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), 
Array(0.0, -1.2, 0.0))
878 assert(sm1.numNonzeros === 1)
879 assert(sm1.numActives === 3)


what shall i do to solve this problem, and when will spark support jdk11?


  was:
[ERROR]: [Error] 
$SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
 size is not a member of Object

[ERROR]: [Error] 
$SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value
 size is not a member of Object

ERROR: two errors found

Below is the related code:
856   test("toString") {
857 val empty = Matrices.ones(0, 0)
858 empty.toString(0, 0)
859 
860 val mat = Matrices.rand(5, 10, new Random())
861 mat.toString(-1, -5)
862 mat.toString(0, 0)
863 mat.toString(Int.MinValue, Int.MinValue)
864 mat.toString(Int.MaxValue, Int.MaxValue)
865 var lines = mat.toString(6, 50).lines.toArray
866 assert(lines.size == 5 && lines.forall(_.size <= 50))
867 
868 lines = mat.toString(5, 100).lines.toArray
869 assert(lines.size == 5 && lines.forall(_.size <= 100))
870   }
871 
872   test("numNonzeros and numActives") {
873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1))
874 assert(dm1.numNonzeros === 3)
875 assert(dm1.numActives === 6)
876 
877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), 
Array(0.0, -1.2, 0.0))
878 assert(sm1.numNonzeros === 1)
879 assert(sm1.numActives === 3)





> spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
>  size is not a member of Object
> -
>
> Key: SPARK-27537
> URL: https://issues.apache.org/jira/browse/SPARK-27537
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.3.0, 2.4.1
> Environment: Machine：aarch64
> OS：Red Hat Enterprise Linux Server release 7.4
> Kernel：4.11.0-44.el7a
> spark version： spark-2.4.1
> java：openjdk version "11.0.2" 2019-01-15
>           OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9)
> scala：2.11.12
> gcc version：4.8.5
>Reporter: dingwei2019
>Priority: Major
>  Labels: build, test
>
> [ERROR]: [Error] 
> $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value
>  size is not a member of Object
> [ERROR]: [Error] 
> $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value
>  size is not a member of Object
> ERROR: two errors found
> Below is the related code:
> 856   test("toString") {
> 857 val empty = Matrices.ones(0, 0)
> 858 empty.toString(0, 0)
> 859 
> 860 val mat = Matrices.rand(5, 10, new Random())
> 861 mat.toString(-1, -5)
> 862 mat.toString(0, 0)
> 863 mat.toString(Int.MinValue, Int.MinValue)
> 864 mat.toString(Int.MaxValue, Int.MaxValue)
> 865 var lines = mat.toString(6, 50).lines.toArray
> 866 assert(lines.size == 5 && lines.forall(_.size <= 50))
> 867 
> 868 lines = mat.toString(5, 100).lines.toArray
> 869 assert(lines.size == 5 && lines.forall(_.size <= 100))
> 870   }
> 871 
> 872   test("numNonzeros and numActives") {
> 873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1))
> 874 assert(dm1.numNonzeros === 3)
> 875 assert(dm1.numActives === 6)
> 876 
> 877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), 
> Array(0.0, -1.2, 0.0))
> 878 assert(sm1.numNonzeros === 1)
> 879 assert(sm

[jira] [Created] (SPARK-27538) sparksql could not start in jdk11, exception org.datanucleus.exceptions.NucleusException: The java type java.lang.Long (jdbc-type='', sql-type="") cant be mapped for thi

2019-04-21 Thread dingwei2019 (JIRA)

dingwei2019 created SPARK-27538:
---

 Summary: sparksql could not start in jdk11, exception 
org.datanucleus.exceptions.NucleusException: The java type java.lang.Long 
(jdbc-type='', sql-type="") cant be mapped for this datastore. No mapping is 
available.
 Key: SPARK-27538
 URL: https://issues.apache.org/jira/browse/SPARK-27538
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.1, 2.3.0
 Environment: Machine：aarch64

OS：Red Hat Enterprise Linux Server release 7.4

Kernel：4.11.0-44.el7a

spark version： spark-2.4.1

java：openjdk version "11.0.2" 2019-01-15

  OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9)

scala：2.11.12

Reporter: dingwei2019


[root@172-19-18-8 spark-2.4.1-bin-hadoop2.7-bak]# bin/spark-sql 
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
(file:/home/dingwei/spark-2.4.1-bin-x86/spark-2.4.1-bin-hadoop2.7-bak/jars/spark-unsafe_2.11-2.4.1.jar)
 to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of 
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
2019-04-22 11:27:34,419 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2019-04-22 11:27:35,306 INFO metastore.HiveMetaStore: 0: Opening raw store with 
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2019-04-22 11:27:35,330 INFO metastore.ObjectStore: ObjectStore, initialize 
called
2019-04-22 11:27:35,492 INFO DataNucleus.Persistence: Property 
hive.metastore.integral.jdo.pushdown unknown - will be ignored
2019-04-22 11:27:35,492 INFO DataNucleus.Persistence: Property 
datanucleus.cache.level2 unknown - will be ignored
2019-04-22 11:27:37,012 INFO metastore.ObjectStore: Setting MetaStore object 
pin classes with 
hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2019-04-22 11:27:37,638 WARN DataNucleus.Query: Query for candidates of 
org.apache.hadoop.hive.metastore.model.MDatabase and subclasses resulted in no 
possible candidates
The java type java.lang.Long (jdbc-type="", sql-type="") cant be mapped for 
this datastore. No mapping is available.
org.datanucleus.exceptions.NucleusException: The java type java.lang.Long 
(jdbc-type="", sql-type="") cant be mapped for this datastore. No mapping is 
available.
 at 
org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1215)
 at 
org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1378)
 at 
org.datanucleus.store.rdbms.table.AbstractClassTable.addDatastoreId(AbstractClassTable.java:392)
 at 
org.datanucleus.store.rdbms.table.ClassTable.initializePK(ClassTable.java:1087)
 at 
org.datanucleus.store.rdbms.table.ClassTable.preInitialize(ClassTable.java:247)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTable(RDBMSStoreManager.java:3118)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTables(RDBMSStoreManager.java:2909)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3182)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841)
 at 
org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605)
 at 
org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679)
 at 
org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408)
 at 
org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947)
 at 
org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370)
 at org.datanucleus.store.query.Query.executeQuery(Query.java:1744)
 at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
 at org.datanucleus.store.query.Query.execute(Query.java:1654)
 at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221)
 at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:183)
 at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.(MetaStoreDirectSql.java:137)
 at 
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:295)
 at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java

[jira] [Commented] (SPARK-27538) sparksql could not start in jdk11, exception org.datanucleus.exceptions.NucleusException: The java type java.lang.Long (jdbc-type='', sql-type="") cant be mapped for t

2019-04-21 Thread Yuming Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822889#comment-16822889
 ] 

Yuming Wang commented on SPARK-27538:
-

Support the JDK11 is still in progress: SPARK-24417

> sparksql could not start in jdk11, exception 
> org.datanucleus.exceptions.NucleusException: The java type java.lang.Long 
> (jdbc-type='', sql-type="") cant be mapped for this datastore. No mapping is 
> available.
> --
>
> Key: SPARK-27538
> URL: https://issues.apache.org/jira/browse/SPARK-27538
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.1
> Environment: Machine：aarch64
> OS：Red Hat Enterprise Linux Server release 7.4
> Kernel：4.11.0-44.el7a
> spark version： spark-2.4.1
> java：openjdk version "11.0.2" 2019-01-15
>   OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9)
> scala：2.11.12
>Reporter: dingwei2019
>Priority: Major
>  Labels: features
>
> [root@172-19-18-8 spark-2.4.1-bin-hadoop2.7-bak]# bin/spark-sql 
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
> (file:/home/dingwei/spark-2.4.1-bin-x86/spark-2.4.1-bin-hadoop2.7-bak/jars/spark-unsafe_2.11-2.4.1.jar)
>  to method java.nio.Bits.unaligned()
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 2019-04-22 11:27:34,419 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2019-04-22 11:27:35,306 INFO metastore.HiveMetaStore: 0: Opening raw store 
> with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
> 2019-04-22 11:27:35,330 INFO metastore.ObjectStore: ObjectStore, initialize 
> called
> 2019-04-22 11:27:35,492 INFO DataNucleus.Persistence: Property 
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 2019-04-22 11:27:35,492 INFO DataNucleus.Persistence: Property 
> datanucleus.cache.level2 unknown - will be ignored
> 2019-04-22 11:27:37,012 INFO metastore.ObjectStore: Setting MetaStore object 
> pin classes with 
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> 2019-04-22 11:27:37,638 WARN DataNucleus.Query: Query for candidates of 
> org.apache.hadoop.hive.metastore.model.MDatabase and subclasses resulted in 
> no possible candidates
> The java type java.lang.Long (jdbc-type="", sql-type="") cant be mapped for 
> this datastore. No mapping is available.
> org.datanucleus.exceptions.NucleusException: The java type java.lang.Long 
> (jdbc-type="", sql-type="") cant be mapped for this datastore. No mapping is 
> available.
>  at 
> org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1215)
>  at 
> org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1378)
>  at 
> org.datanucleus.store.rdbms.table.AbstractClassTable.addDatastoreId(AbstractClassTable.java:392)
>  at 
> org.datanucleus.store.rdbms.table.ClassTable.initializePK(ClassTable.java:1087)
>  at 
> org.datanucleus.store.rdbms.table.ClassTable.preInitialize(ClassTable.java:247)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTable(RDBMSStoreManager.java:3118)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTables(RDBMSStoreManager.java:2909)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3182)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841)
>  at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605)
>  at 
> org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954)
>  at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679)
>  at 
> org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408)
>  at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947)
>  at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370)
>  at org.datanucleus.store.query.Query.executeQue

[jira] [Commented] (SPARK-13263) SQL generation support for tablesample

2019-04-21 Thread angerszhu (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-13263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822891#comment-16822891
 ] 

angerszhu commented on SPARK-13263:
---

[~Tagar] 

I make some change in Spark SQL's ASTBuild, can support this.

> SQL generation support for tablesample
> --
>
> Key: SPARK-13263
> URL: https://issues.apache.org/jira/browse/SPARK-13263
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
> Fix For: 2.0.0
>
>
> {code}
> SELECT s.id FROM t0 TABLESAMPLE(0.1 PERCENT) s
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

44 matches

Mail list logo