[jira] [Assigned] (SPARK-27527) Improve description of Timestamp and Date types
[ https://issues.apache.org/jira/browse/SPARK-27527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-27527: Assignee: Maxim Gekk > Improve description of Timestamp and Date types > --- > > Key: SPARK-27527 > URL: https://issues.apache.org/jira/browse/SPARK-27527 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 2.4.1 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Trivial > > Describe precisely semantic of TimestampType and DateType, how they represent > dates and timestamps internally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27527) Improve description of Timestamp and Date types
[ https://issues.apache.org/jira/browse/SPARK-27527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-27527. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24424 [https://github.com/apache/spark/pull/24424] > Improve description of Timestamp and Date types > --- > > Key: SPARK-27527 > URL: https://issues.apache.org/jira/browse/SPARK-27527 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 2.4.1 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Trivial > Fix For: 3.0.0 > > > Describe precisely semantic of TimestampType and DateType, how they represent > dates and timestamps internally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27532) Correct the default value in the Documentation for "spark.redaction.regex"
[ https://issues.apache.org/jira/browse/SPARK-27532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-27532. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24428 [https://github.com/apache/spark/pull/24428] > Correct the default value in the Documentation for "spark.redaction.regex" > -- > > Key: SPARK-27532 > URL: https://issues.apache.org/jira/browse/SPARK-27532 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Shivu Sondur >Assignee: Shivu Sondur >Priority: Minor > Fix For: 3.0.0 > > > Correct the default value in the Documentation for "spark.redaction.regex". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27532) Correct the default value in the Documentation for "spark.redaction.regex"
[ https://issues.apache.org/jira/browse/SPARK-27532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-27532: Assignee: Shivu Sondur > Correct the default value in the Documentation for "spark.redaction.regex" > -- > > Key: SPARK-27532 > URL: https://issues.apache.org/jira/browse/SPARK-27532 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Shivu Sondur >Assignee: Shivu Sondur >Priority: Minor > > Correct the default value in the Documentation for "spark.redaction.regex". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27533) Date/timestamps CSV benchmarks
[ https://issues.apache.org/jira/browse/SPARK-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27533: --- Summary: Date/timestamps CSV benchmarks (was: CSV benchmarks date/timestamp ops ) > Date/timestamps CSV benchmarks > -- > > Key: SPARK-27533 > URL: https://issues.apache.org/jira/browse/SPARK-27533 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.1 >Reporter: Maxim Gekk >Priority: Minor > > Extend CSVBenchmark by new benchmarks: > - Write dates/timestamps to files > - Read/infer dates/timestamp from files > - Read/infer dates/timestamps from Dataset[String] > - to_csv/from_csv for dates/timestamps -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27533) CSV benchmarks date/timestamp ops
Maxim Gekk created SPARK-27533: -- Summary: CSV benchmarks date/timestamp ops Key: SPARK-27533 URL: https://issues.apache.org/jira/browse/SPARK-27533 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.4.1 Reporter: Maxim Gekk Extend CSVBenchmark by new benchmarks: - Write dates/timestamps to files - Read/infer dates/timestamp from files - Read/infer dates/timestamps from Dataset[String] - to_csv/from_csv for dates/timestamps -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27533) Date and timestamp CSV benchmarks
[ https://issues.apache.org/jira/browse/SPARK-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27533: --- Summary: Date and timestamp CSV benchmarks (was: Date/timestamps CSV benchmarks) > Date and timestamp CSV benchmarks > - > > Key: SPARK-27533 > URL: https://issues.apache.org/jira/browse/SPARK-27533 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.1 >Reporter: Maxim Gekk >Priority: Minor > > Extend CSVBenchmark by new benchmarks: > - Write dates/timestamps to files > - Read/infer dates/timestamp from files > - Read/infer dates/timestamps from Dataset[String] > - to_csv/from_csv for dates/timestamps -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27367) Faster RoaringBitmap Serialization with v0.8.0
[ https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822724#comment-16822724 ] Liang-Chi Hsieh commented on SPARK-27367: - I changed spark code to use the new API when upgrading the new version of roaring bitmap. The size of the bitmap is also related to spareness and distribution of empty blocks. I don't have real loading to produce big bitmap. So I manually created a HighlyCompressedMapStatus and benchmarked serializing/deserializing of the bitmap inside. I use a pretty big block sizes array to the HighlyCompressedMapStatus. I think we don't set such number of partitions (1) on the reduce side. With this bitmap, I can see a little performance difference (9ms v.s. 6ms) between old and new serde API. {code} val conf = new SparkConf(false) conf.set(KRYO_REGISTRATION_REQUIRED, true) val ser = new KryoSerializer(conf).newInstance() val blockSizes = (0L until 1L).map { i => if (i % 2 == 0) { 0L } else { i } }.toArray val serialized = ser.serialize(HighlyCompressedMapStatus(BlockManagerId("exec-1", "host", 1234), blockSizes)) ser.deserialize(serialized) {code} > Faster RoaringBitmap Serialization with v0.8.0 > -- > > Key: SPARK-27367 > URL: https://issues.apache.org/jira/browse/SPARK-27367 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > RoaringBitmap 0.8.0 adds faster serde, but also requires us to change how we > call the serde routines slightly to take advantage of it. This is probably a > worthwhile optimization as the every shuffle map task with a large # of > partitions generates these bitmaps, and the driver especially has to > deserialize many of these messages. > See > * https://github.com/apache/spark/pull/24264#issuecomment-479675572 > * https://github.com/RoaringBitmap/RoaringBitmap/pull/325 > * https://github.com/RoaringBitmap/RoaringBitmap/issues/319 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24601) Bump Jackson version to 2.9.6
[ https://issues.apache.org/jira/browse/SPARK-24601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24601: -- Fix Version/s: 2.4.3 > Bump Jackson version to 2.9.6 > - > > Key: SPARK-24601 > URL: https://issues.apache.org/jira/browse/SPARK-24601 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Fokko Driesprong >Priority: Major > Fix For: 3.0.0, 2.4.3 > > > The Jackson version is lacking behind, and therefore I have to add a lot of > exclusions to the SBT files: > ``` > Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible > Jackson version: 2.9.5 > at > com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64) > at > com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19) > at > com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751) > at > org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala:82) > at > org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala) > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27051) Bump Jackson version to 2.9.8
[ https://issues.apache.org/jira/browse/SPARK-27051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27051: -- Fix Version/s: 2.4.3 > Bump Jackson version to 2.9.8 > - > > Key: SPARK-27051 > URL: https://issues.apache.org/jira/browse/SPARK-27051 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yanbo Liang >Assignee: Yanbo Liang >Priority: Major > Fix For: 3.0.0, 2.4.3 > > > Fasterxml Jackson version before 2.9.8 is affected by multiple CVEs > [[https://github.com/FasterXML/jackson-databind/issues/2186]], we need to fix > bump the dependent Jackson to 2.9.8. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24601) Bump Jackson version to 2.9.6
[ https://issues.apache.org/jira/browse/SPARK-24601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-24601: - Assignee: Fokko Driesprong > Bump Jackson version to 2.9.6 > - > > Key: SPARK-24601 > URL: https://issues.apache.org/jira/browse/SPARK-24601 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 3.0.0, 2.4.3 > > > The Jackson version is lacking behind, and therefore I have to add a lot of > exclusions to the SBT files: > ``` > Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible > Jackson version: 2.9.5 > at > com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64) > at > com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19) > at > com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751) > at > org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala:82) > at > org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala) > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27439) Explain result should match collected result after view change
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27439: -- Summary: Explain result should match collected result after view change (was: createOrReplaceTempView cannot update old dataset) > Explain result should match collected result after view change > -- > > Key: SPARK-27439 > URL: https://issues.apache.org/jira/browse/SPARK-27439 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.1 >Reporter: xjl >Priority: Major > > {code:java} > SparkSession spark = SparkSession > .builder() > .appName("app").enableHiveSupport().master("local[4]") > .getOrCreate(); > spark.sql("select * from default.t1").createOrReplaceTempView("tmp001"); > Dataset hiveTable = spark.sql("select * from tmp001"); > spark.sql("select * from default.t2").createOrReplaceTempView("tmp001"); > hiveTable.show(); > > } > {code} > hiveTable show the value of t1 but not t2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27439) Explain result should match collected result after view change
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27439: -- Priority: Minor (was: Major) > Explain result should match collected result after view change > -- > > Key: SPARK-27439 > URL: https://issues.apache.org/jira/browse/SPARK-27439 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.1 >Reporter: xjl >Priority: Minor > > {code:java} > SparkSession spark = SparkSession > .builder() > .appName("app").enableHiveSupport().master("local[4]") > .getOrCreate(); > spark.sql("select * from default.t1").createOrReplaceTempView("tmp001"); > Dataset hiveTable = spark.sql("select * from tmp001"); > spark.sql("select * from default.t2").createOrReplaceTempView("tmp001"); > hiveTable.show(); > > } > {code} > hiveTable show the value of t1 but not t2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27439) Use analyzed plan when explaining Dataset
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27439: -- Summary: Use analyzed plan when explaining Dataset (was: Explain result should match collected result after view change) > Use analyzed plan when explaining Dataset > - > > Key: SPARK-27439 > URL: https://issues.apache.org/jira/browse/SPARK-27439 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.1 >Reporter: xjl >Priority: Minor > > {code:java} > SparkSession spark = SparkSession > .builder() > .appName("app").enableHiveSupport().master("local[4]") > .getOrCreate(); > spark.sql("select * from default.t1").createOrReplaceTempView("tmp001"); > Dataset hiveTable = spark.sql("select * from tmp001"); > spark.sql("select * from default.t2").createOrReplaceTempView("tmp001"); > hiveTable.show(); > > } > {code} > hiveTable show the value of t1 but not t2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27439) Use analyzed plan when explaining Dataset
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27439: -- Issue Type: Improvement (was: Bug) > Use analyzed plan when explaining Dataset > - > > Key: SPARK-27439 > URL: https://issues.apache.org/jira/browse/SPARK-27439 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: xjl >Priority: Minor > > {code:java} > SparkSession spark = SparkSession > .builder() > .appName("app").enableHiveSupport().master("local[4]") > .getOrCreate(); > spark.sql("select * from default.t1").createOrReplaceTempView("tmp001"); > Dataset hiveTable = spark.sql("select * from tmp001"); > spark.sql("select * from default.t2").createOrReplaceTempView("tmp001"); > hiveTable.show(); > > } > {code} > hiveTable show the value of t1 but not t2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27439) Use analyzed plan when explaining Dataset
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27439: -- Description: {code} scala> spark.range(10).createOrReplaceTempView("test") scala> spark.range(5).createOrReplaceTempView("test2") scala> spark.sql("select * from test").createOrReplaceTempView("tmp001") scala> val df = spark.sql("select * from tmp001") scala> spark.sql("select * from test2").createOrReplaceTempView("tmp001") scala> df.show +---+ | id| +---+ | 0| | 1| | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| +---+ scala> df.explain {code} Before: {code} == Physical Plan == *(1) Range (0, 5, step=1, splits=12) {code} After: {code} == Physical Plan == *(1) Range (0, 10, step=1, splits=12) {code} was: {code:java} SparkSession spark = SparkSession .builder() .appName("app").enableHiveSupport().master("local[4]") .getOrCreate(); spark.sql("select * from default.t1").createOrReplaceTempView("tmp001"); Dataset hiveTable = spark.sql("select * from tmp001"); spark.sql("select * from default.t2").createOrReplaceTempView("tmp001"); hiveTable.show(); } {code} hiveTable show the value of t1 but not t2 > Use analyzed plan when explaining Dataset > - > > Key: SPARK-27439 > URL: https://issues.apache.org/jira/browse/SPARK-27439 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: xjl >Priority: Minor > > {code} > scala> spark.range(10).createOrReplaceTempView("test") > scala> spark.range(5).createOrReplaceTempView("test2") > scala> spark.sql("select * from test").createOrReplaceTempView("tmp001") > scala> val df = spark.sql("select * from tmp001") > scala> spark.sql("select * from test2").createOrReplaceTempView("tmp001") > scala> df.show > +---+ > | id| > +---+ > | 0| > | 1| > | 2| > | 3| > | 4| > | 5| > | 6| > | 7| > | 8| > | 9| > +---+ > scala> df.explain > {code} > Before: > {code} > == Physical Plan == > *(1) Range (0, 5, step=1, splits=12) > {code} > After: > {code} > == Physical Plan == > *(1) Range (0, 10, step=1, splits=12) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27439) Use analyzed plan when explaining Dataset
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27439. --- Resolution: Fixed Assignee: Liang-Chi Hsieh Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/24415 > Use analyzed plan when explaining Dataset > - > > Key: SPARK-27439 > URL: https://issues.apache.org/jira/browse/SPARK-27439 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: xjl >Assignee: Liang-Chi Hsieh >Priority: Minor > Fix For: 3.0.0 > > > {code} > scala> spark.range(10).createOrReplaceTempView("test") > scala> spark.range(5).createOrReplaceTempView("test2") > scala> spark.sql("select * from test").createOrReplaceTempView("tmp001") > scala> val df = spark.sql("select * from tmp001") > scala> spark.sql("select * from test2").createOrReplaceTempView("tmp001") > scala> df.show > +---+ > | id| > +---+ > | 0| > | 1| > | 2| > | 3| > | 4| > | 5| > | 6| > | 7| > | 8| > | 9| > +---+ > scala> df.explain > {code} > Before: > {code} > == Physical Plan == > *(1) Range (0, 5, step=1, splits=12) > {code} > After: > {code} > == Physical Plan == > *(1) Range (0, 10, step=1, splits=12) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27529) Spark Streaming consumer dies with kafka.common.OffsetOutOfRangeException
[ https://issues.apache.org/jira/browse/SPARK-27529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Goldenberg updated SPARK-27529: -- Description: We have a Spark Streaming consumer which at a certain point started consistently failing upon a restart with the below error. Some details: * Spark version is 1.5.0. * Kafka version is 0.8.2.1 (2.10-0.8.2.1). * The topic is configured with: retention.ms=1471228928, max.message.bytes=1. * The consumer runs with auto.offset.reset=smallest. * No checkpointing is currently enabled. I don't see anything in the Spark or Kafka doc to understand why this is happening. From googling around, {noformat} https://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/ Finally, I’ll repeat that any semantics beyond at-most-once require that you have sufficient log retention in Kafka. If you’re seeing things like OffsetOutOfRangeException, it’s probably because you underprovisioned Kafka storage, not because something’s wrong with Spark or Kafka.{noformat} Also looking at SPARK-12693 and SPARK-11693, I don't understand the possible causes. {noformat} You've under-provisioned Kafka storage and / or Spark compute capacity. The result is that data is being deleted before it has been processed.{noformat} All we're trying to do is start the consumer and consume from the topic from the earliest available offset. Why would we not be able to do that? How can the offsets be out of range if we're saying, just read from the earliest available? Since we have the retention.ms set to 1 year and we created the topic just a few weeks ago, I'd not expect any deletion being done by Kafka as we're consuming. I'd like to understand the actual cause of this error. Any recommendations on a workaround would be appreciated. Stack traces: {noformat} 2019-04-19 11:35:17,147 ERROR org.apache.spark.scheduler .TaskSetManager: Task 10 in stage 147.0 failed 4 times; aborting job 2019-04-19 11:35:17,160 ERROR org.apache.spark.streaming.scheduler.JobScheduler: Error running job streaming job 1555682554000 ms.0 org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 147.0 failed 4 times, most recent failure: Lost task 10.3 in stage 147.0 (TID 2368, 10.150.0.58): kafka.common.OffsetOutOfRangeException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:86) at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.handleFetchErr(KafkaRDD.scala:184) at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.fetchBatch(KafkaRDD.scala:193) at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.getNext(KafkaRDD.scala:208) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29) at com.acme.consumer.kafka.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:69) at com.acme.consumer.kafka.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:24) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:898) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:898) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.sca la:1280) ~[spark-assembly-1.5.0-hadoop2.4.0.jar:1.5.0] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268) ~[spark-assembly-1.5.0-hadoop2.4 .0.jar:1.5.0] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267) ~
[jira] [Resolved] (SPARK-27473) Support filter push down for status fields in binary file data source
[ https://issues.apache.org/jira/browse/SPARK-27473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-27473. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24387 [https://github.com/apache/spark/pull/24387] > Support filter push down for status fields in binary file data source > - > > Key: SPARK-27473 > URL: https://issues.apache.org/jira/browse/SPARK-27473 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Weichen Xu >Priority: Major > Fix For: 3.0.0 > > > As a user, I can use > `spark.read.format("binaryFile").load(path).filter($"status.lenght" < > 1L)` to load files that are less than 1e8 bytes. Spark shouldn't even > read files that are bigger than 1e8 bytes in this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25348) Data source for binary files
[ https://issues.apache.org/jira/browse/SPARK-25348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818640#comment-16818640 ] Xiangrui Meng edited comment on SPARK-25348 at 4/21/19 7:49 PM: I created follow-up tasks: * Documentation: SPARK-27472 * Filter push down: SPARK-27473 * Content column pruning: SPARK-27534 was (Author: mengxr): I created two follow-up tasks: * Documentation: SPARK-27472 * Filter push down: SPARK-27473 > Data source for binary files > > > Key: SPARK-25348 > URL: https://issues.apache.org/jira/browse/SPARK-25348 > Project: Spark > Issue Type: Story > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Weichen Xu >Priority: Major > Fix For: 3.0.0 > > > It would be useful to have a data source implementation for binary files, > which can be used to build features to load images, audio, and videos. > Microsoft has an implementation at > [https://github.com/Azure/mmlspark/tree/master/src/io/binary.] It would be > great if we can merge it into Spark main repo. > cc: [~mhamilton] and [~imatiach] > Proposed API: > Format name: "binaryFile" > Schema: > * content: BinaryType > * status (following Hadoop FIleStatus): > ** path: StringType > ** modificationTime: Timestamp > ** length: LongType (size limit 2GB) > Options: > * pathGlobFilter: only include files with path matching the glob pattern > Input partition size can be controlled by common SQL confs: maxPartitionBytes > and openCostInBytes -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27534) Do not load `content` column in binary data source if it is not selected
Xiangrui Meng created SPARK-27534: - Summary: Do not load `content` column in binary data source if it is not selected Key: SPARK-27534 URL: https://issues.apache.org/jira/browse/SPARK-27534 Project: Spark Issue Type: Story Components: SQL Affects Versions: 3.0.0 Reporter: Xiangrui Meng A follow-up task from SPARK-25348. To save I/O cost, Spark shouldn't attempt to read the file if users didn't request the `content` column. For example: {code} spark.read.format("binaryFile").load(path).filter($"length" < 100).count() {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27535) Date and timestamp JSON benchmarks
Maxim Gekk created SPARK-27535: -- Summary: Date and timestamp JSON benchmarks Key: SPARK-27535 URL: https://issues.apache.org/jira/browse/SPARK-27535 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.4.1 Reporter: Maxim Gekk Extend JSONBenchmark by new benchmarks: * Write dates/timestamps to files * Read/infer dates/timestamp from files * Read/infer dates/timestamps from Dataset[String] * to_json/from_json for dates/timestamps -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27287) PCAModel.load() does not honor spark configs
[ https://issues.apache.org/jira/browse/SPARK-27287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822759#comment-16822759 ] Dharmesh Kakadia commented on SPARK-27287: -- [~mgaido] what do you mean by : use the sparkSession when reading ML models ? Also, for whats its worth, if I use the following to set the config, the same PCAModel.load() call works. spark._jsc.hadoopConfiguration().set("fs.azure.account.key.test.blob.core.windows.net","Xosad==") > PCAModel.load() does not honor spark configs > > > Key: SPARK-27287 > URL: https://issues.apache.org/jira/browse/SPARK-27287 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: Dharmesh Kakadia >Priority: Major > > PCAModel.load() does not seem to be using the configurations set on the > current spark session. > Repro: > > The following will fail to read the data because the storage account > credentials config used/propagated. > conf.set("fs.azure.account.key.test.blob.core.windows.net","Xosad==") > spark = > SparkSession.builder.appName("dharmesh").config(conf=conf).master('spark://spark-master:7077').getOrCreate() > model = PCAModel.load('wasb://t...@test.blob.core.windows.net/model') > > The following however works: > conf.set("fs.azure.account.key.test.blob.core.windows.net","Xosad==") > spark = > SparkSession.builder.appName("dharmesh").config(conf=conf).master('spark://spark-master:7077').getOrCreate() > blah = > spark.read.json('wasb://t...@test.blob.core.windows.net/somethingelse/') > blah.show() > model = PCAModel.load('wasb://t...@test.blob.core.windows.net/model') > > It looks like spark.read...() does force the use of the config once and then > PCAModel.load() will work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27274) Refer to Scala 2.12 in docs; deprecate Scala 2.11 support in 2.4.1
[ https://issues.apache.org/jira/browse/SPARK-27274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-27274: -- Labels: release-notes (was: ) > Refer to Scala 2.12 in docs; deprecate Scala 2.11 support in 2.4.1 > -- > > Key: SPARK-27274 > URL: https://issues.apache.org/jira/browse/SPARK-27274 > Project: Spark > Issue Type: Task > Components: Documentation, Spark Core >Affects Versions: 2.4.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > Labels: release-notes > Fix For: 2.4.1 > > > Scala 2.11 support is removed in Spark 3.0, so we should at least call it > deprecated in 2.4.x. > The 2.4.x docs current refer to Scala 2.11 artifacts. As 2.12 has been > supported since 2.4.0 without any significant issues, we should refer to 2.12 > artifacts in the docs by default as well. > You could say this implicitly declares it 'unexperimental', if it ever was > deemed experimental, as we'd certainly support 2.12 and not change that > support for the foreseeable future. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24601) Bump Jackson version to 2.9.6
[ https://issues.apache.org/jira/browse/SPARK-24601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-24601: -- Labels: release-notes (was: ) > Bump Jackson version to 2.9.6 > - > > Key: SPARK-24601 > URL: https://issues.apache.org/jira/browse/SPARK-24601 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: release-notes > Fix For: 3.0.0, 2.4.3 > > > The Jackson version is lacking behind, and therefore I have to add a lot of > exclusions to the SBT files: > ``` > Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible > Jackson version: 2.9.5 > at > com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64) > at > com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19) > at > com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751) > at > org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala:82) > at > org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala) > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24601) Bump Jackson version to 2.9.6
[ https://issues.apache.org/jira/browse/SPARK-24601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-24601: -- Docs Text: Spark's Jackson dependency has been updated from 2.6.x to 2.9.x. User applications that inherit Spark's Jackson version should note that various Jackson behaviors changed between these releases. > Bump Jackson version to 2.9.6 > - > > Key: SPARK-24601 > URL: https://issues.apache.org/jira/browse/SPARK-24601 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: release-notes > Fix For: 3.0.0, 2.4.3 > > > The Jackson version is lacking behind, and therefore I have to add a lot of > exclusions to the SBT files: > ``` > Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible > Jackson version: 2.9.5 > at > com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64) > at > com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19) > at > com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751) > at > org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala:82) > at > org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala) > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27419) When setting spark.executor.heartbeatInterval to a value less than 1 seconds, it will always fail
[ https://issues.apache.org/jira/browse/SPARK-27419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-27419: -- Labels: release-notes (was: ) > When setting spark.executor.heartbeatInterval to a value less than 1 seconds, > it will always fail > - > > Key: SPARK-27419 > URL: https://issues.apache.org/jira/browse/SPARK-27419 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 2.4.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > Fix For: 2.4.2 > > > When setting spark.executor.heartbeatInterval to a value less than 1 seconds > in branch-2.4, it will always fail because the value will be converted to 0 > and the heartbeat will always timeout and finally kill the executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27480) Improve explain output of describe query command to show the actual input query as opposed to a truncated logical plan.
[ https://issues.apache.org/jira/browse/SPARK-27480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27480: -- Issue Type: Improvement (was: Bug) > Improve explain output of describe query command to show the actual input > query as opposed to a truncated logical plan. > --- > > Key: SPARK-27480 > URL: https://issues.apache.org/jira/browse/SPARK-27480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: Dilip Biswal >Priority: Minor > > Currently running explain on describe query gives a little confusing output. > Instead of showing the actual query that is input by the user, it shows the > truncated logical plan as the input. We should improve it to show the query > text as input by user. > Here are the sample outputs of the explain command. > > {code:java} > EXPLAIN DESCRIBE WITH s AS (SELECT 'hello' as col1) SELECT * FROM s; > == Physical Plan == > Execute DescribeQueryCommand >+- DescribeQueryCommand CTE [s] > {code} > {code:java} > EXPLAIN EXTENDED DESCRIBE SELECT * from s1 where c1 > 0; > == Physical Plan == > Execute DescribeQueryCommand >+- DescribeQueryCommand 'Project [*] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27480) Improve `EXPLAIN DESC QUERY` to show the input SQL statement
[ https://issues.apache.org/jira/browse/SPARK-27480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27480. --- Resolution: Fixed Assignee: Dilip Biswal Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/24385 > Improve `EXPLAIN DESC QUERY` to show the input SQL statement > > > Key: SPARK-27480 > URL: https://issues.apache.org/jira/browse/SPARK-27480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: Dilip Biswal >Assignee: Dilip Biswal >Priority: Minor > Fix For: 3.0.0 > > > Currently running explain on describe query gives a little confusing output. > Instead of showing the actual query that is input by the user, it shows the > truncated logical plan as the input. We should improve it to show the query > text as input by user. > Here are the sample outputs of the explain command. > > {code:java} > EXPLAIN DESCRIBE WITH s AS (SELECT 'hello' as col1) SELECT * FROM s; > == Physical Plan == > Execute DescribeQueryCommand >+- DescribeQueryCommand CTE [s] > {code} > {code:java} > EXPLAIN EXTENDED DESCRIBE SELECT * from s1 where c1 > 0; > == Physical Plan == > Execute DescribeQueryCommand >+- DescribeQueryCommand 'Project [*] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27480) Improve `EXPLAIN DESC QUERY` to show the input SQL statement
[ https://issues.apache.org/jira/browse/SPARK-27480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27480: -- Summary: Improve `EXPLAIN DESC QUERY` to show the input SQL statement (was: Improve explain output of describe query command to show the actual input query as opposed to a truncated logical plan.) > Improve `EXPLAIN DESC QUERY` to show the input SQL statement > > > Key: SPARK-27480 > URL: https://issues.apache.org/jira/browse/SPARK-27480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: Dilip Biswal >Priority: Minor > > Currently running explain on describe query gives a little confusing output. > Instead of showing the actual query that is input by the user, it shows the > truncated logical plan as the input. We should improve it to show the query > text as input by user. > Here are the sample outputs of the explain command. > > {code:java} > EXPLAIN DESCRIBE WITH s AS (SELECT 'hello' as col1) SELECT * FROM s; > == Physical Plan == > Execute DescribeQueryCommand >+- DescribeQueryCommand CTE [s] > {code} > {code:java} > EXPLAIN EXTENDED DESCRIBE SELECT * from s1 where c1 > 0; > == Physical Plan == > Execute DescribeQueryCommand >+- DescribeQueryCommand 'Project [*] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27513) Spark tarball with binaries should have files owned by uid 0
[ https://issues.apache.org/jira/browse/SPARK-27513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822789#comment-16822789 ] koert kuipers commented on SPARK-27513: --- i think this can be closed as wont fix > Spark tarball with binaries should have files owned by uid 0 > > > Key: SPARK-27513 > URL: https://issues.apache.org/jira/browse/SPARK-27513 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.1 >Reporter: koert kuipers >Priority: Minor > Fix For: 3.0.0 > > > currently the tarball is created in dev/make-distribution.sh like this: > {code:bash} > tar czf "spark-$VERSION-bin-$NAME.tgz" -C "$SPARK_HOME" "$TARDIR_NAME" > {code} > the problem with this is that if root unpacks this tarball the files are > owned by whatever the uid is of the person that created the tarball. this uid > probably doesnt exist or belongs to a different unrelated user. this is > unlikely to be what anyone wants. > for other users this problem doesnt exist since tar is now allowed to change > uid. so when they unpack the tarball the files are owned by them. > it is more typical to set the uid and gid to 0 for a tarball. that way when > root unpacks it the files are owned by root. so like this: > {code:bash} > tar czf "spark-$VERSION-bin-$NAME.tgz" --numeric-owner --owner=0 --group=0 -C > "$SPARK_HOME" "$TARDIR_NAME" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27512) Decimal parsing leads to unexpected type inference
[ https://issues.apache.org/jira/browse/SPARK-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822790#comment-16822790 ] koert kuipers commented on SPARK-27512: --- [~maxgekk] max do you know why getDecimalParser has that if condition for Locale US where it calls {code} (s: String) => new java.math.BigDecimal(s.replaceAll(",", "")) {code} i think it's that {code}s.replaceAll(",", ""){code} that is causing my issues. i saw it was introduced in: {code} commit 7a83d71403edf7d24fa5efc0ef913f3ce76d88b8 Author: Maxim Gekk Date: Thu Nov 29 22:15:12 2018 +0800 [SPARK-26163][SQL] Parsing decimals from JSON using locale {code} > Decimal parsing leads to unexpected type inference > -- > > Key: SPARK-27512 > URL: https://issues.apache.org/jira/browse/SPARK-27512 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: spark 3.0.0-SNAPSHOT from this commit: > {code:bash} > commit 3ab96d7acf870e53c9016b0b63d0b328eec23bed > Author: Dilip Biswal > Date: Mon Apr 15 21:26:45 2019 +0800 > {code} >Reporter: koert kuipers >Priority: Minor > > {code:bash} > $ hadoop fs -text test.bsv > x|y > 1|1,2 > 2|2,3 > 3|3,4 > {code} > in spark 2.4.1: > {code:bash} > scala> val data = spark.read.format("csv").option("header", > true).option("delimiter", "|").option("inferSchema", true).load("test.bsv") > scala> data.printSchema > root > |-- x: integer (nullable = true) > |-- y: string (nullable = true) > scala> data.show > +---+---+ > | x| y| > +---+---+ > | 1|1,2| > | 2|2,3| > | 3|3,4| > +---+---+ > {code} > in spark 3.0.0-SNAPSHOT: > {code:bash} > scala> val data = spark.read.format("csv").option("header", > true).option("delimiter", "|").option("inferSchema", true).load("test.bsv") > scala> data.printSchema > root > |-- x: integer (nullable = true) > |-- y: decimal(2,0) (nullable = true) > scala> data.show > +---+---+ > | x| y| > +---+---+ > | 1| 12| > | 2| 23| > | 3| 34| > +---+---+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27512) Decimal parsing leads to unexpected type inference
[ https://issues.apache.org/jira/browse/SPARK-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822790#comment-16822790 ] koert kuipers edited comment on SPARK-27512 at 4/21/19 11:03 PM: - [~maxgekk] maxim do you know why getDecimalParser has that if condition for Locale US where it calls {code:java} (s: String) => new java.math.BigDecimal(s.replaceAll(",", "")) {code} i think it's that {code:java} s.replaceAll(",", ""){code} that is causing my issues. i saw it was introduced in: {code:java} commit 7a83d71403edf7d24fa5efc0ef913f3ce76d88b8 Author: Maxim Gekk Date: Thu Nov 29 22:15:12 2018 +0800 [SPARK-26163][SQL] Parsing decimals from JSON using locale {code} was (Author: koert): [~maxgekk] max do you know why getDecimalParser has that if condition for Locale US where it calls {code} (s: String) => new java.math.BigDecimal(s.replaceAll(",", "")) {code} i think it's that {code}s.replaceAll(",", ""){code} that is causing my issues. i saw it was introduced in: {code} commit 7a83d71403edf7d24fa5efc0ef913f3ce76d88b8 Author: Maxim Gekk Date: Thu Nov 29 22:15:12 2018 +0800 [SPARK-26163][SQL] Parsing decimals from JSON using locale {code} > Decimal parsing leads to unexpected type inference > -- > > Key: SPARK-27512 > URL: https://issues.apache.org/jira/browse/SPARK-27512 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: spark 3.0.0-SNAPSHOT from this commit: > {code:bash} > commit 3ab96d7acf870e53c9016b0b63d0b328eec23bed > Author: Dilip Biswal > Date: Mon Apr 15 21:26:45 2019 +0800 > {code} >Reporter: koert kuipers >Priority: Minor > > {code:bash} > $ hadoop fs -text test.bsv > x|y > 1|1,2 > 2|2,3 > 3|3,4 > {code} > in spark 2.4.1: > {code:bash} > scala> val data = spark.read.format("csv").option("header", > true).option("delimiter", "|").option("inferSchema", true).load("test.bsv") > scala> data.printSchema > root > |-- x: integer (nullable = true) > |-- y: string (nullable = true) > scala> data.show > +---+---+ > | x| y| > +---+---+ > | 1|1,2| > | 2|2,3| > | 3|3,4| > +---+---+ > {code} > in spark 3.0.0-SNAPSHOT: > {code:bash} > scala> val data = spark.read.format("csv").option("header", > true).option("delimiter", "|").option("inferSchema", true).load("test.bsv") > scala> data.printSchema > root > |-- x: integer (nullable = true) > |-- y: decimal(2,0) (nullable = true) > scala> data.show > +---+---+ > | x| y| > +---+---+ > | 1| 12| > | 2| 23| > | 3| 34| > +---+---+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27512) Decimal parsing leads to unexpected type inference
[ https://issues.apache.org/jira/browse/SPARK-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822796#comment-16822796 ] koert kuipers commented on SPARK-27512: --- seems DecimalFormat.parse also simply ignores commas. still unclear to me why we have to do same in the "special handling of default locale for backwards compatibility" but i am guessing that has to do with json parsing backwards compatibility, not csv backwards compatibility. > Decimal parsing leads to unexpected type inference > -- > > Key: SPARK-27512 > URL: https://issues.apache.org/jira/browse/SPARK-27512 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: spark 3.0.0-SNAPSHOT from this commit: > {code:bash} > commit 3ab96d7acf870e53c9016b0b63d0b328eec23bed > Author: Dilip Biswal > Date: Mon Apr 15 21:26:45 2019 +0800 > {code} >Reporter: koert kuipers >Priority: Minor > > {code:bash} > $ hadoop fs -text test.bsv > x|y > 1|1,2 > 2|2,3 > 3|3,4 > {code} > in spark 2.4.1: > {code:bash} > scala> val data = spark.read.format("csv").option("header", > true).option("delimiter", "|").option("inferSchema", true).load("test.bsv") > scala> data.printSchema > root > |-- x: integer (nullable = true) > |-- y: string (nullable = true) > scala> data.show > +---+---+ > | x| y| > +---+---+ > | 1|1,2| > | 2|2,3| > | 3|3,4| > +---+---+ > {code} > in spark 3.0.0-SNAPSHOT: > {code:bash} > scala> val data = spark.read.format("csv").option("header", > true).option("delimiter", "|").option("inferSchema", true).load("test.bsv") > scala> data.printSchema > root > |-- x: integer (nullable = true) > |-- y: decimal(2,0) (nullable = true) > scala> data.show > +---+---+ > | x| y| > +---+---+ > | 1| 12| > | 2| 23| > | 3| 34| > +---+---+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27496) RPC should send back the fatal errors
[ https://issues.apache.org/jira/browse/SPARK-27496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27496. --- Resolution: Fixed Fix Version/s: 2.4.3 3.0.0 2.3.4 This is resolved via https://github.com/apache/spark/pull/24396 > RPC should send back the fatal errors > - > > Key: SPARK-27496 > URL: https://issues.apache.org/jira/browse/SPARK-27496 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.3.4, 3.0.0, 2.4.3 > > > Right now, when a fatal error throws from "receiveAndReply", the sender will > not be notified. We should try our best to send it back. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27536) Code improvements for 3.0: existentials edition
Sean Owen created SPARK-27536: - Summary: Code improvements for 3.0: existentials edition Key: SPARK-27536 URL: https://issues.apache.org/jira/browse/SPARK-27536 Project: Spark Issue Type: Improvement Components: ML, Spark Core, SQL, Structured Streaming Affects Versions: 3.0.0 Reporter: Sean Owen Assignee: Sean Owen The Spark code base makes use of 'existential types' in Scala, a language feature which is quasi-deprecated -- it generates a warning unless scala.language.existentials is imported, and there is talk of removing it from future Scala versions: https://contributors.scala-lang.org/t/proposal-to-remove-existential-types-from-the-language/2785 We can get rid of most usages of this feature with lots of minor changes to the code. A PR is coming to demonstrate what's involved. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27537) spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object
dingwei2019 created SPARK-27537: --- Summary: spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object Key: SPARK-27537 URL: https://issues.apache.org/jira/browse/SPARK-27537 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 2.4.1, 2.3.0 Environment: Machine:aarch64 OS:Red Hat Enterprise Linux Server release 7.4 Kernel:4.11.0-44.el7a spark version: spark-2.4.1 java:openjdk version "11.0.2" 2019-01-15 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9) scala:2.11.12 gcc version:4.8.5 Reporter: dingwei2019 [ERROR]: [Error] $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object [ERROR]: [Error] $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value size is not a member of Object ERROR: two errors found Below is the related code: 856 test("toString") { 857 val empty = Matrices.ones(0, 0) 858 empty.toString(0, 0) 859 860 val mat = Matrices.rand(5, 10, new Random()) 861 mat.toString(-1, -5) 862 mat.toString(0, 0) 863 mat.toString(Int.MinValue, Int.MinValue) 864 mat.toString(Int.MaxValue, Int.MaxValue) 865 var lines = mat.toString(6, 50).lines.toArray 866 assert(lines.size == 5 && lines.forall(_.size <= 50)) 867 868 lines = mat.toString(5, 100).lines.toArray 869 assert(lines.size == 5 && lines.forall(_.size <= 100)) 870 } 871 872 test("numNonzeros and numActives") { 873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1)) 874 assert(dm1.numNonzeros === 3) 875 assert(dm1.numActives === 6) 876 877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), Array(0.0, -1.2, 0.0)) 878 assert(sm1.numNonzeros === 1) 879 assert(sm1.numActives === 3) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27537) spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object
[ https://issues.apache.org/jira/browse/SPARK-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dingwei2019 updated SPARK-27537: Docs Text: (was: the question is found in spark ml test module, althrough this is an test module, i want to figure it out. from the describe above, it seems an incompatible problem between java 11 and scala 2.11.12. if I change my jdk to jdk8, and there is no problem. Below is my analysis: it seems in spark if a method has implementation in java, spark will use java method, or will use scala method. 'string' class in java11 adds the lines method. This method conflicts with the scala syntax. scala has lines method in 'stringlike' class, the method return an Iterator; Iterator in scala has a toArray method, the method return an Array; the class array in scala has a size method. so if spark use scala method, it will have no problem. lines(Iterator)-->toArray(Array)-->size But Java11 adds lines method in 'string', this will return a Stream; Stream in java11 has toArray method, and will return Object; Object has no 'size' method. This is what the error says. (Stream)-->(Object)toArray-->has no size method.) > spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value > size is not a member of Object > - > > Key: SPARK-27537 > URL: https://issues.apache.org/jira/browse/SPARK-27537 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.3.0, 2.4.1 > Environment: Machine:aarch64 > OS:Red Hat Enterprise Linux Server release 7.4 > Kernel:4.11.0-44.el7a > spark version: spark-2.4.1 > java:openjdk version "11.0.2" 2019-01-15 > OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9) > scala:2.11.12 > gcc version:4.8.5 >Reporter: dingwei2019 >Priority: Major > Labels: build, test > > [ERROR]: [Error] > $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value > size is not a member of Object > [ERROR]: [Error] > $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value > size is not a member of Object > ERROR: two errors found > Below is the related code: > 856 test("toString") { > 857 val empty = Matrices.ones(0, 0) > 858 empty.toString(0, 0) > 859 > 860 val mat = Matrices.rand(5, 10, new Random()) > 861 mat.toString(-1, -5) > 862 mat.toString(0, 0) > 863 mat.toString(Int.MinValue, Int.MinValue) > 864 mat.toString(Int.MaxValue, Int.MaxValue) > 865 var lines = mat.toString(6, 50).lines.toArray > 866 assert(lines.size == 5 && lines.forall(_.size <= 50)) > 867 > 868 lines = mat.toString(5, 100).lines.toArray > 869 assert(lines.size == 5 && lines.forall(_.size <= 100)) > 870 } > 871 > 872 test("numNonzeros and numActives") { > 873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1)) > 874 assert(dm1.numNonzeros === 3) > 875 assert(dm1.numActives === 6) > 876 > 877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), > Array(0.0, -1.2, 0.0)) > 878 assert(sm1.numNonzeros === 1) > 879 assert(sm1.numActives === 3) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27537) spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object
[ https://issues.apache.org/jira/browse/SPARK-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822845#comment-16822845 ] dingwei2019 commented on SPARK-27537: - the question is found in spark ml test module, althrough this is an test module, i want to figure it out. from the describe above, it seems an incompatible problem between java 11 and scala 2.11.12. if I change my jdk to jdk8, and there is no problem. Below is my analysis: it seems in spark if a method has implementation in java, spark will use java method, or will use scala method. 'string' class in java11 adds the lines method. This method conflicts with the scala syntax. scala has lines method in 'stringlike' class, the method return an Iterator; Iterator in scala has a toArray method, the method return an Array; the class array in scala has a size method. so if spark use scala method, it will have no problem. lines(Iterator)-->toArray(Array)-->size But Java11 adds lines method in 'string', this will return a Stream; Stream in java11 has toArray method, and will return Object; Object has no 'size' method. This is what the error says. (Stream)-->(Object)toArray-->has no size method. > spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value > size is not a member of Object > - > > Key: SPARK-27537 > URL: https://issues.apache.org/jira/browse/SPARK-27537 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.3.0, 2.4.1 > Environment: Machine:aarch64 > OS:Red Hat Enterprise Linux Server release 7.4 > Kernel:4.11.0-44.el7a > spark version: spark-2.4.1 > java:openjdk version "11.0.2" 2019-01-15 > OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9) > scala:2.11.12 > gcc version:4.8.5 >Reporter: dingwei2019 >Priority: Major > Labels: build, test > > [ERROR]: [Error] > $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value > size is not a member of Object > [ERROR]: [Error] > $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value > size is not a member of Object > ERROR: two errors found > Below is the related code: > 856 test("toString") { > 857 val empty = Matrices.ones(0, 0) > 858 empty.toString(0, 0) > 859 > 860 val mat = Matrices.rand(5, 10, new Random()) > 861 mat.toString(-1, -5) > 862 mat.toString(0, 0) > 863 mat.toString(Int.MinValue, Int.MinValue) > 864 mat.toString(Int.MaxValue, Int.MaxValue) > 865 var lines = mat.toString(6, 50).lines.toArray > 866 assert(lines.size == 5 && lines.forall(_.size <= 50)) > 867 > 868 lines = mat.toString(5, 100).lines.toArray > 869 assert(lines.size == 5 && lines.forall(_.size <= 100)) > 870 } > 871 > 872 test("numNonzeros and numActives") { > 873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1)) > 874 assert(dm1.numNonzeros === 3) > 875 assert(dm1.numActives === 6) > 876 > 877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), > Array(0.0, -1.2, 0.0)) > 878 assert(sm1.numNonzeros === 1) > 879 assert(sm1.numActives === 3) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27537) spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object
[ https://issues.apache.org/jira/browse/SPARK-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822845#comment-16822845 ] dingwei2019 edited comment on SPARK-27537 at 4/22/19 3:15 AM: -- the question is found in spark ml test module, althrough this is an test module, i want to figure it out. from the describe above, it seems an incompatible problem between java 11 and scala 2.11.12. if I change my jdk to jdk8, and there is no problem. Below is my analysis: it seems in spark if a method has implementation in java, spark will use java method, or will use scala method. 'string' class in java11 adds the lines method. This method conflicts with the scala syntax. scala has lines method in 'stringlike' class, the method return an Iterator; Iterator in scala has a toArray method, the method return an Array; the class array in scala has a size method. so if spark use scala method, it will have no problem. lines(Iterator)-->toArray(Array)-->size But Java11 adds lines method in 'string', this will return a Stream; Stream in java11 has toArray method, and will return Object; Object has no 'size' method. This is what the error says. (Stream)-->(Object)toArray-->has no size method. what shall i do to solve this problem. was (Author: dingwei2019): the question is found in spark ml test module, althrough this is an test module, i want to figure it out. from the describe above, it seems an incompatible problem between java 11 and scala 2.11.12. if I change my jdk to jdk8, and there is no problem. Below is my analysis: it seems in spark if a method has implementation in java, spark will use java method, or will use scala method. 'string' class in java11 adds the lines method. This method conflicts with the scala syntax. scala has lines method in 'stringlike' class, the method return an Iterator; Iterator in scala has a toArray method, the method return an Array; the class array in scala has a size method. so if spark use scala method, it will have no problem. lines(Iterator)-->toArray(Array)-->size But Java11 adds lines method in 'string', this will return a Stream; Stream in java11 has toArray method, and will return Object; Object has no 'size' method. This is what the error says. (Stream)-->(Object)toArray-->has no size method. > spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value > size is not a member of Object > - > > Key: SPARK-27537 > URL: https://issues.apache.org/jira/browse/SPARK-27537 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.3.0, 2.4.1 > Environment: Machine:aarch64 > OS:Red Hat Enterprise Linux Server release 7.4 > Kernel:4.11.0-44.el7a > spark version: spark-2.4.1 > java:openjdk version "11.0.2" 2019-01-15 > OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9) > scala:2.11.12 > gcc version:4.8.5 >Reporter: dingwei2019 >Priority: Major > Labels: build, test > > [ERROR]: [Error] > $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value > size is not a member of Object > [ERROR]: [Error] > $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value > size is not a member of Object > ERROR: two errors found > Below is the related code: > 856 test("toString") { > 857 val empty = Matrices.ones(0, 0) > 858 empty.toString(0, 0) > 859 > 860 val mat = Matrices.rand(5, 10, new Random()) > 861 mat.toString(-1, -5) > 862 mat.toString(0, 0) > 863 mat.toString(Int.MinValue, Int.MinValue) > 864 mat.toString(Int.MaxValue, Int.MaxValue) > 865 var lines = mat.toString(6, 50).lines.toArray > 866 assert(lines.size == 5 && lines.forall(_.size <= 50)) > 867 > 868 lines = mat.toString(5, 100).lines.toArray > 869 assert(lines.size == 5 && lines.forall(_.size <= 100)) > 870 } > 871 > 872 test("numNonzeros and numActives") { > 873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1)) > 874 assert(dm1.numNonzeros === 3) > 875 assert(dm1.numActives === 6) > 876 > 877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), > Array(0.0, -1.2, 0.0)) > 878 assert(sm1.numNonzeros === 1) > 879 assert(sm1.numActives === 3) > what shall i do to solve this problem, and when will spark support jdk11? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27537) spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object
[ https://issues.apache.org/jira/browse/SPARK-27537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dingwei2019 updated SPARK-27537: Description: [ERROR]: [Error] $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object [ERROR]: [Error] $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value size is not a member of Object ERROR: two errors found Below is the related code: 856 test("toString") { 857 val empty = Matrices.ones(0, 0) 858 empty.toString(0, 0) 859 860 val mat = Matrices.rand(5, 10, new Random()) 861 mat.toString(-1, -5) 862 mat.toString(0, 0) 863 mat.toString(Int.MinValue, Int.MinValue) 864 mat.toString(Int.MaxValue, Int.MaxValue) 865 var lines = mat.toString(6, 50).lines.toArray 866 assert(lines.size == 5 && lines.forall(_.size <= 50)) 867 868 lines = mat.toString(5, 100).lines.toArray 869 assert(lines.size == 5 && lines.forall(_.size <= 100)) 870 } 871 872 test("numNonzeros and numActives") { 873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1)) 874 assert(dm1.numNonzeros === 3) 875 assert(dm1.numActives === 6) 876 877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), Array(0.0, -1.2, 0.0)) 878 assert(sm1.numNonzeros === 1) 879 assert(sm1.numActives === 3) what shall i do to solve this problem, and when will spark support jdk11? was: [ERROR]: [Error] $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value size is not a member of Object [ERROR]: [Error] $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value size is not a member of Object ERROR: two errors found Below is the related code: 856 test("toString") { 857 val empty = Matrices.ones(0, 0) 858 empty.toString(0, 0) 859 860 val mat = Matrices.rand(5, 10, new Random()) 861 mat.toString(-1, -5) 862 mat.toString(0, 0) 863 mat.toString(Int.MinValue, Int.MinValue) 864 mat.toString(Int.MaxValue, Int.MaxValue) 865 var lines = mat.toString(6, 50).lines.toArray 866 assert(lines.size == 5 && lines.forall(_.size <= 50)) 867 868 lines = mat.toString(5, 100).lines.toArray 869 assert(lines.size == 5 && lines.forall(_.size <= 100)) 870 } 871 872 test("numNonzeros and numActives") { 873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1)) 874 assert(dm1.numNonzeros === 3) 875 assert(dm1.numActives === 6) 876 877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), Array(0.0, -1.2, 0.0)) 878 assert(sm1.numNonzeros === 1) 879 assert(sm1.numActives === 3) > spark-2.4.1/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value > size is not a member of Object > - > > Key: SPARK-27537 > URL: https://issues.apache.org/jira/browse/SPARK-27537 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.3.0, 2.4.1 > Environment: Machine:aarch64 > OS:Red Hat Enterprise Linux Server release 7.4 > Kernel:4.11.0-44.el7a > spark version: spark-2.4.1 > java:openjdk version "11.0.2" 2019-01-15 > OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9) > scala:2.11.12 > gcc version:4.8.5 >Reporter: dingwei2019 >Priority: Major > Labels: build, test > > [ERROR]: [Error] > $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.866:Value > size is not a member of Object > [ERROR]: [Error] > $SPARK_HOME/mlib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala.869:Value > size is not a member of Object > ERROR: two errors found > Below is the related code: > 856 test("toString") { > 857 val empty = Matrices.ones(0, 0) > 858 empty.toString(0, 0) > 859 > 860 val mat = Matrices.rand(5, 10, new Random()) > 861 mat.toString(-1, -5) > 862 mat.toString(0, 0) > 863 mat.toString(Int.MinValue, Int.MinValue) > 864 mat.toString(Int.MaxValue, Int.MaxValue) > 865 var lines = mat.toString(6, 50).lines.toArray > 866 assert(lines.size == 5 && lines.forall(_.size <= 50)) > 867 > 868 lines = mat.toString(5, 100).lines.toArray > 869 assert(lines.size == 5 && lines.forall(_.size <= 100)) > 870 } > 871 > 872 test("numNonzeros and numActives") { > 873 val dm1 = Matrices.dense(3, 2, Array(0, 0, -1, 1, 0, 1)) > 874 assert(dm1.numNonzeros === 3) > 875 assert(dm1.numActives === 6) > 876 > 877 val sm1 = Matrices.sparse(3, 2, Array(0, 2, 3), Array(0, 2, 1), > Array(0.0, -1.2, 0.0)) > 878 assert(sm1.numNonzeros === 1) > 879 assert(sm
[jira] [Created] (SPARK-27538) sparksql could not start in jdk11, exception org.datanucleus.exceptions.NucleusException: The java type java.lang.Long (jdbc-type='', sql-type="") cant be mapped for thi
dingwei2019 created SPARK-27538: --- Summary: sparksql could not start in jdk11, exception org.datanucleus.exceptions.NucleusException: The java type java.lang.Long (jdbc-type='', sql-type="") cant be mapped for this datastore. No mapping is available. Key: SPARK-27538 URL: https://issues.apache.org/jira/browse/SPARK-27538 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.1, 2.3.0 Environment: Machine:aarch64 OS:Red Hat Enterprise Linux Server release 7.4 Kernel:4.11.0-44.el7a spark version: spark-2.4.1 java:openjdk version "11.0.2" 2019-01-15 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9) scala:2.11.12 Reporter: dingwei2019 [root@172-19-18-8 spark-2.4.1-bin-hadoop2.7-bak]# bin/spark-sql WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/dingwei/spark-2.4.1-bin-x86/spark-2.4.1-bin-hadoop2.7-bak/jars/spark-unsafe_2.11-2.4.1.jar) to method java.nio.Bits.unaligned() WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2019-04-22 11:27:34,419 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-04-22 11:27:35,306 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 2019-04-22 11:27:35,330 INFO metastore.ObjectStore: ObjectStore, initialize called 2019-04-22 11:27:35,492 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 2019-04-22 11:27:35,492 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 2019-04-22 11:27:37,012 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 2019-04-22 11:27:37,638 WARN DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDatabase and subclasses resulted in no possible candidates The java type java.lang.Long (jdbc-type="", sql-type="") cant be mapped for this datastore. No mapping is available. org.datanucleus.exceptions.NucleusException: The java type java.lang.Long (jdbc-type="", sql-type="") cant be mapped for this datastore. No mapping is available. at org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1215) at org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1378) at org.datanucleus.store.rdbms.table.AbstractClassTable.addDatastoreId(AbstractClassTable.java:392) at org.datanucleus.store.rdbms.table.ClassTable.initializePK(ClassTable.java:1087) at org.datanucleus.store.rdbms.table.ClassTable.preInitialize(ClassTable.java:247) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTable(RDBMSStoreManager.java:3118) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTables(RDBMSStoreManager.java:2909) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3182) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841) at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122) at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605) at org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954) at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679) at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370) at org.datanucleus.store.query.Query.executeQuery(Query.java:1744) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672) at org.datanucleus.store.query.Query.execute(Query.java:1654) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:183) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.(MetaStoreDirectSql.java:137) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:295) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java
[jira] [Commented] (SPARK-27538) sparksql could not start in jdk11, exception org.datanucleus.exceptions.NucleusException: The java type java.lang.Long (jdbc-type='', sql-type="") cant be mapped for t
[ https://issues.apache.org/jira/browse/SPARK-27538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822889#comment-16822889 ] Yuming Wang commented on SPARK-27538: - Support the JDK11 is still in progress: SPARK-24417 > sparksql could not start in jdk11, exception > org.datanucleus.exceptions.NucleusException: The java type java.lang.Long > (jdbc-type='', sql-type="") cant be mapped for this datastore. No mapping is > available. > -- > > Key: SPARK-27538 > URL: https://issues.apache.org/jira/browse/SPARK-27538 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.1 > Environment: Machine:aarch64 > OS:Red Hat Enterprise Linux Server release 7.4 > Kernel:4.11.0-44.el7a > spark version: spark-2.4.1 > java:openjdk version "11.0.2" 2019-01-15 > OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9) > scala:2.11.12 >Reporter: dingwei2019 >Priority: Major > Labels: features > > [root@172-19-18-8 spark-2.4.1-bin-hadoop2.7-bak]# bin/spark-sql > WARNING: An illegal reflective access operation has occurred > WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform > (file:/home/dingwei/spark-2.4.1-bin-x86/spark-2.4.1-bin-hadoop2.7-bak/jars/spark-unsafe_2.11-2.4.1.jar) > to method java.nio.Bits.unaligned() > WARNING: Please consider reporting this to the maintainers of > org.apache.spark.unsafe.Platform > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > WARNING: All illegal access operations will be denied in a future release > 2019-04-22 11:27:34,419 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2019-04-22 11:27:35,306 INFO metastore.HiveMetaStore: 0: Opening raw store > with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore > 2019-04-22 11:27:35,330 INFO metastore.ObjectStore: ObjectStore, initialize > called > 2019-04-22 11:27:35,492 INFO DataNucleus.Persistence: Property > hive.metastore.integral.jdo.pushdown unknown - will be ignored > 2019-04-22 11:27:35,492 INFO DataNucleus.Persistence: Property > datanucleus.cache.level2 unknown - will be ignored > 2019-04-22 11:27:37,012 INFO metastore.ObjectStore: Setting MetaStore object > pin classes with > hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" > 2019-04-22 11:27:37,638 WARN DataNucleus.Query: Query for candidates of > org.apache.hadoop.hive.metastore.model.MDatabase and subclasses resulted in > no possible candidates > The java type java.lang.Long (jdbc-type="", sql-type="") cant be mapped for > this datastore. No mapping is available. > org.datanucleus.exceptions.NucleusException: The java type java.lang.Long > (jdbc-type="", sql-type="") cant be mapped for this datastore. No mapping is > available. > at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1215) > at > org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1378) > at > org.datanucleus.store.rdbms.table.AbstractClassTable.addDatastoreId(AbstractClassTable.java:392) > at > org.datanucleus.store.rdbms.table.ClassTable.initializePK(ClassTable.java:1087) > at > org.datanucleus.store.rdbms.table.ClassTable.preInitialize(ClassTable.java:247) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTable(RDBMSStoreManager.java:3118) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTables(RDBMSStoreManager.java:2909) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3182) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841) > at > org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605) > at > org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679) > at > org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408) > at > org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947) > at > org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370) > at org.datanucleus.store.query.Query.executeQue
[jira] [Commented] (SPARK-13263) SQL generation support for tablesample
[ https://issues.apache.org/jira/browse/SPARK-13263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822891#comment-16822891 ] angerszhu commented on SPARK-13263: --- [~Tagar] I make some change in Spark SQL's ASTBuild, can support this. > SQL generation support for tablesample > -- > > Key: SPARK-13263 > URL: https://issues.apache.org/jira/browse/SPARK-13263 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Major > Fix For: 2.0.0 > > > {code} > SELECT s.id FROM t0 TABLESAMPLE(0.1 PERCENT) s > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org