[jira] [Assigned] (SPARK-15959) Add the support of hive.metastore.warehouse.dir back
[ https://issues.apache.org/jira/browse/SPARK-15959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai reassigned SPARK-15959: Assignee: Yin Huai > Add the support of hive.metastore.warehouse.dir back > > > Key: SPARK-15959 > URL: https://issues.apache.org/jira/browse/SPARK-15959 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Critical > Labels: release_notes, releasenotes > > Right now, we do not load the value of this value at all > (https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSharedState.scala#L35-L41). > Let's maintain the backward compatibility by loading it if spark's warehouse > conf is not set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-15961) Audit new SQL confs
[ https://issues.apache.org/jira/browse/SPARK-15961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell closed SPARK-15961. - Resolution: Duplicate > Audit new SQL confs > > > Key: SPARK-15961 > URL: https://issues.apache.org/jira/browse/SPARK-15961 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Herman van Hovell >Assignee: Herman van Hovell > > Check the current SQL configuration names for inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15959) Add the support of hive.metastore.warehouse.dir back
[ https://issues.apache.org/jira/browse/SPARK-15959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-15959: - Labels: release_notes releasenotes (was: ) > Add the support of hive.metastore.warehouse.dir back > > > Key: SPARK-15959 > URL: https://issues.apache.org/jira/browse/SPARK-15959 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Priority: Critical > Labels: release_notes, releasenotes > > Right now, we do not load the value of this value at all > (https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSharedState.scala#L35-L41). > Let's maintain the backward compatibility by loading it if spark's warehouse > conf is not set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331168#comment-15331168 ] Jinxia Liu commented on SPARK-12177: Thanks Cody! > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15815) Hang while enable blacklistExecutor and DynamicExecutorAllocator
[ https://issues.apache.org/jira/browse/SPARK-15815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331132#comment-15331132 ] Imran Rashid commented on SPARK-15815: -- [~SuYan] is this the same as https://issues.apache.org/jira/browse/SPARK-15865 ? The situation you are describing seems the same, though that doesn't only affect Dynamic Allocation. Perhaps there is something better you can do with dynamic allocation as well, but maybe that is a different issue. Take a look at the latest design doc I posted on SPARK-8426 to see if that addresses your concern. > Hang while enable blacklistExecutor and DynamicExecutorAllocator > - > > Key: SPARK-15815 > URL: https://issues.apache.org/jira/browse/SPARK-15815 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 1.6.1 >Reporter: SuYan >Priority: Minor > > Enable BlacklistExecutor with some time large than 120s and enabled > DynamicAllocate with minExecutors = 0 > 1. Assume there only left 1 task running in Executor A, and other Executor > are all timeout. > 2. the task failed, so task will not scheduled in current Executor A due to > enable blacklistTime. > 3. For ExecutorAllocateManager, it always request targetNumExecutor=1 > executors, due to we already have executor A, so the oldTargetNumExecutor == > targetNumExecutor = 1, so will never add more Executors...even if Executor A > was timeout. it became endless request delta=0 executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15815) Hang while enable blacklistExecutor and DynamicExecutorAllocator
[ https://issues.apache.org/jira/browse/SPARK-15815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid updated SPARK-15815: - Component/s: Scheduler > Hang while enable blacklistExecutor and DynamicExecutorAllocator > - > > Key: SPARK-15815 > URL: https://issues.apache.org/jira/browse/SPARK-15815 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 1.6.1 >Reporter: SuYan >Priority: Minor > > Enable BlacklistExecutor with some time large than 120s and enabled > DynamicAllocate with minExecutors = 0 > 1. Assume there only left 1 task running in Executor A, and other Executor > are all timeout. > 2. the task failed, so task will not scheduled in current Executor A due to > enable blacklistTime. > 3. For ExecutorAllocateManager, it always request targetNumExecutor=1 > executors, due to we already have executor A, so the oldTargetNumExecutor == > targetNumExecutor = 1, so will never add more Executors...even if Executor A > was timeout. it became endless request delta=0 executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13928) Move org.apache.spark.Logging into org.apache.spark.internal.Logging
[ https://issues.apache.org/jira/browse/SPARK-13928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331119#comment-15331119 ] Reynold Xin commented on SPARK-13928: - It was never meant to be public (the comment had a note saying it's private). You can certainly copy the code out (just a few lines of code) and put it in your own project. > Move org.apache.spark.Logging into org.apache.spark.internal.Logging > > > Key: SPARK-13928 > URL: https://issues.apache.org/jira/browse/SPARK-13928 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Reynold Xin >Assignee: Wenchen Fan > Fix For: 2.0.0 > > > Logging was made private in Spark 2.0. If we move it, then users would be > able to create a Logging trait themselves to avoid changing their own code. > Alternatively, we can also provide in a compatibility package that adds > logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331114#comment-15331114 ] Cody Koeninger commented on SPARK-12177: [~jinx...@ebay.com] looks like that test had some flaky timing, I cleaned it up a bit, passed 5 times in a row locally. Will see how it does on Jenkins > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15961) Audit new SQL confs
Herman van Hovell created SPARK-15961: - Summary: Audit new SQL confs Key: SPARK-15961 URL: https://issues.apache.org/jira/browse/SPARK-15961 Project: Spark Issue Type: Bug Components: SQL Reporter: Herman van Hovell Assignee: Herman van Hovell Check the current SQL configuration names for inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15960) Audit new SQL confs
Herman van Hovell created SPARK-15960: - Summary: Audit new SQL confs Key: SPARK-15960 URL: https://issues.apache.org/jira/browse/SPARK-15960 Project: Spark Issue Type: Bug Components: SQL Reporter: Herman van Hovell Assignee: Herman van Hovell Check the current SQL configuration names for inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15824) Run 'with ... insert ... select' failed when use spark thriftserver
[ https://issues.apache.org/jira/browse/SPARK-15824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331027#comment-15331027 ] Apache Spark commented on SPARK-15824: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/13678 > Run 'with ... insert ... select' failed when use spark thriftserver > --- > > Key: SPARK-15824 > URL: https://issues.apache.org/jira/browse/SPARK-15824 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Weizhong >Priority: Minor > > {code:sql} > create table src(k int, v int); > create table src_parquet(k int, v int); > with v as (select 1, 2) insert into table src_parquet from src; > {code} > Will throw exception: spark.sql.execution.id is already set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331026#comment-15331026 ] Jinxia Liu commented on SPARK-12177: [~c...@koeninger.org] thanks for the quick reply. 1. glad to know you are checking it. 2. the kafka0.10 consumer is not difficult to use, I agree, but in most cases with the connector, consumer gets assigned the topics, not subscribe to them, the connector needs to know all the partitions of a topic, if the upstream kafka gets changed, the consumer code needs change manually. Maybe there is two sides of this issue, since you are against this, lets keep the code as it is now. > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331021#comment-15331021 ] Cody Koeninger commented on SPARK-12177: [~jinx...@ebay.com] 1. I'm already looking at that test failure, will update once I know what's going on. 2. I'm really strongly against trying to hide the kafka consumer from users for 0.10, I don't want to be in the business of anticipating all the ways people will use it, nor the ways it may change. The 0.10 consumer isn't particularly difficult to use, the most basic construction of it is just new KafkaConsumer[String, String](kafkaParams) consumer.subscribe(topics) You don't need to know anything about partitionInfo, unless you want/need to. > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15959) Add the support of hive.metastore.warehouse.dir back
Yin Huai created SPARK-15959: Summary: Add the support of hive.metastore.warehouse.dir back Key: SPARK-15959 URL: https://issues.apache.org/jira/browse/SPARK-15959 Project: Spark Issue Type: Bug Components: SQL Reporter: Yin Huai Priority: Critical Right now, we do not load the value of this value at all (https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSharedState.scala#L35-L41). Let's maintain the backward compatibility by loading it if spark's warehouse conf is not set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331007#comment-15331007 ] Jinxia Liu edited comment on SPARK-12177 at 6/15/16 2:29 AM: - [~c...@koeninger.org] thanks for contributing the connector for kafka0.9 and kafka0.10. I used your kafka0.10 connector and ran into some problems, would you mind looking at them? 1. when build using "mvn clean package", there is error about not passing the test case in DirectKafkaStreamSuite: offset recovery *** FAILED *** The code passed to eventually never returned normally. Attempted 196 times over 10.031047939 seconds. Last failure message: 55 did not equal 210. (DirectKafkaStreamSuite.scala:337) 2. another problem is(with kafka0.9 connector as well), can we add a wrapper, something like CreateDirectKafkaStream as in kafka0.8 connector, to wrap up the DirectKafkaStream constructor? The benefit is that user does not need to know the kafka consumer APIs, in order to use the connector. E.g.: the kafka consumer in the connector gets assigned a collection of TopicPartition, in most cases, all the partitions for given topic, if no wrapper, user needs to exploit the kafka consumer API to first retrieve the partitionInfo. Using the wrapper, user only needs to provide the topics, and such info can be passed to consumer inside the wrapper without the users knowledge. was (Author: jinx...@ebay.com): [~c...@koeninger.org] thanks for contributing the connector for kafka0.9 and kafka0.10. I used your kafka0.10 connector and ran into some problems, would you mind looking at them? 1. when build using "mvn clean package", there is error about not passing the test case in DirectKafkaStreamSuite: offset recovery *** FAILED *** The code passed to eventually never returned normally. Attempted 196 times over 10.031047939 seconds. Last failure message: 55 did not equal 210. (DirectKafkaStreamSuite.scala:337) 2. another problem is(with kafka0.9 connector as well), can we add a wrapper, something like CreateDirectKafkaStream in kafka0.8 connector, to wrap up the DirectKafkaStream constructor? The benefit is that user does not need to know the kafka consumer APIs, in order to use the connector. E.g.: the kafka consumer in the connector gets assigned a collection of TopicPartition, in most cases, all the partitions for given topic, if no wrapper, user needs to exploit the kafka consumer API to first retrieve the partitionInfo. Using the wrapper, user only needs to provide the topics, and such info can be passed to consumer inside the wrapper without the users knowledge. > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331007#comment-15331007 ] Jinxia Liu commented on SPARK-12177: [~c...@koeninger.org] thanks for contributing the connector for kafka0.9 and kafka0.10. I used your kafka0.10 connector and ran into some problems, would you mind looking at them? 1. when build using "mvn clean package", there is error about not passing the test case in DirectKafkaStreamSuite: offset recovery *** FAILED *** The code passed to eventually never returned normally. Attempted 196 times over 10.031047939 seconds. Last failure message: 55 did not equal 210. (DirectKafkaStreamSuite.scala:337) 2. another problem is(with kafka0.9 connector as well), can we add a wrapper, something like CreateDirectKafkaStream in kafka0.8 connector, to wrap up the DirectKafkaStream constructor? The benefit is that user does not need to know the kafka consumer APIs, in order to use the connector. E.g.: the kafka consumer in the connector gets assigned a collection of TopicPartition, in most cases, all the partitions for given topic, if no wrapper, user needs to exploit the kafka consumer API to first retrieve the partitionInfo. Using the wrapper, user only needs to provide the topics, and such info can be passed to consumer inside the wrapper without the users knowledge. > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331005#comment-15331005 ] Jinxia Liu commented on SPARK-12177: [~c...@koeninger.org] thanks for contributing the connector for kafka0.9 and kafka0.10. I used your kafka0.10 connector and ran into some problems, would you mind looking at them? 1. when build using "mvn clean package", there is error about not passing the test case in DirectKafkaStreamSuite: offset recovery *** FAILED *** The code passed to eventually never returned normally. Attempted 196 times over 10.031047939 seconds. Last failure message: 55 did not equal 210. (DirectKafkaStreamSuite.scala:337) 2. another problem is(with kafka0.9 connector as well), can we add a wrapper, something like CreateDirectKafkaStream in kafka0.8 connector, to wrap up the DirectKafkaStream constructor? The benefit is that user does not need to know the kafka consumer APIs, in order to use the connector. E.g.: the kafka consumer in the connector gets assigned a collection of TopicPartition, in most cases, all the partitions for given topic, if no wrapper, user needs to exploit the kafka consumer API to first retrieve the partitionInfo. Using the wrapper, user only needs to provide the topics, and such info can be passed to consumer inside the wrapper without the users knowledge. > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331003#comment-15331003 ] Cody Koeninger commented on SPARK-12177: I had verified basic functionality with a broker set up to require TLS, all that's required is setting kafkaParams appropriately. I'm a little hesitant to claim that's "secure" for anyone's particular purpose, though. E.g. enabling SSL for spark communication (so that things like the truststore password in kafkaParams aren't sent in cleartext from the driver) would probably be a good idea as well. > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15952) "show databases" does not get sorted result
[ https://issues.apache.org/jira/browse/SPARK-15952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-15952. - Resolution: Fixed Assignee: Bo Meng Fix Version/s: 2.0.0 > "show databases" does not get sorted result > --- > > Key: SPARK-15952 > URL: https://issues.apache.org/jira/browse/SPARK-15952 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Assignee: Bo Meng > Fix For: 2.0.0 > > > Two issues I've found for "show databases" commands: > 1. The returned database name list was not sorted, it only works when "like" > was used together; (HIVE will always return a sorted list) > 2. When it is used as sql("show databases").show, it will output a table with > column named as "result", but for sql("show tables").show, it will output the > column name as "tableName", so I think we should be consistent and use > "databaseName" at least. > I will make a PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15945) Implement conversion utils in Scala/Java
[ https://issues.apache.org/jira/browse/SPARK-15945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang resolved SPARK-15945. - Resolution: Fixed > Implement conversion utils in Scala/Java > > > Key: SPARK-15945 > URL: https://issues.apache.org/jira/browse/SPARK-15945 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > This is to provide conversion utils between old/new vector columns in a > DataFrame. So users can use it to migrate their datasets and pipelines > manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15065) HiveSparkSubmitSuite's "set spark.sql.warehouse.dir" is flaky
[ https://issues.apache.org/jira/browse/SPARK-15065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-15065. -- Resolution: Fixed Fix Version/s: 2.0.0 > HiveSparkSubmitSuite's "set spark.sql.warehouse.dir" is flaky > - > > Key: SPARK-15065 > URL: https://issues.apache.org/jira/browse/SPARK-15065 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Reporter: Yin Huai >Priority: Critical > Fix For: 2.0.0 > > Attachments: log.txt > > > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/861/testReport/junit/org.apache.spark.sql.hive/HiveSparkSubmitSuite/dir/ > There are several WARN messages like {{16/05/02 00:51:06 WARN Master: Got > status update for unknown executor app-20160502005054-/3}}, which are > suspicious. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15065) HiveSparkSubmitSuite's "set spark.sql.warehouse.dir" is flaky
[ https://issues.apache.org/jira/browse/SPARK-15065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330999#comment-15330999 ] Yin Huai commented on SPARK-15065: -- Thanks. I took a look at jenkins' history. Looks like it is good now. > HiveSparkSubmitSuite's "set spark.sql.warehouse.dir" is flaky > - > > Key: SPARK-15065 > URL: https://issues.apache.org/jira/browse/SPARK-15065 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Reporter: Yin Huai >Priority: Critical > Attachments: log.txt > > > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/861/testReport/junit/org.apache.spark.sql.hive/HiveSparkSubmitSuite/dir/ > There are several WARN messages like {{16/05/02 00:51:06 WARN Master: Got > status update for unknown executor app-20160502005054-/3}}, which are > suspicious. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15631) Dataset and encoder bug fixes
[ https://issues.apache.org/jira/browse/SPARK-15631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-15631. -- Resolution: Fixed Fix Version/s: 2.0.0 > Dataset and encoder bug fixes > - > > Key: SPARK-15631 > URL: https://issues.apache.org/jira/browse/SPARK-15631 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian > Fix For: 2.0.0 > > > This is an umbrella ticket for various Dataset and encoder bug fixes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12323) Don't assign default value for non-nullable columns of a Dataset
[ https://issues.apache.org/jira/browse/SPARK-12323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-12323. -- Resolution: Fixed Fix Version/s: 2.0.0 > Don't assign default value for non-nullable columns of a Dataset > > > Key: SPARK-12323 > URL: https://issues.apache.org/jira/browse/SPARK-12323 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 2.0.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > Fix For: 2.0.0 > > > For a field of a Dataset, if it's specified as non-nullable in the schema of > the Dataset, we shouldn't assign default value for it if input data contain > null. Instead, a runtime exception with nice error message should be thrown, > and ask the user to use {{Option}} or nullable types (e.g., > {{java.lang.Integer}} instead of {{scala.Int}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15011) org.apache.spark.sql.hive.StatisticsSuite.analyze MetastoreRelations fails when hadoop 2.3 or hadoop 2.4 is used
[ https://issues.apache.org/jira/browse/SPARK-15011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-15011. - Resolution: Fixed Assignee: Herman van Hovell Fix Version/s: 2.0.0 > org.apache.spark.sql.hive.StatisticsSuite.analyze MetastoreRelations fails > when hadoop 2.3 or hadoop 2.4 is used > > > Key: SPARK-15011 > URL: https://issues.apache.org/jira/browse/SPARK-15011 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Reporter: Yin Huai >Assignee: Herman van Hovell >Priority: Critical > Labels: flaky-test > Fix For: 2.0.0 > > > Let's disable it first. > https://spark-tests.appspot.com/tests/org.apache.spark.sql.hive.StatisticsSuite/analyze%20MetastoreRelations -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15933) Refactor reader-writer interface for streaming DFs to use DataStreamReader/Writer
[ https://issues.apache.org/jira/browse/SPARK-15933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-15933. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13653 [https://github.com/apache/spark/pull/13653] > Refactor reader-writer interface for streaming DFs to use > DataStreamReader/Writer > - > > Key: SPARK-15933 > URL: https://issues.apache.org/jira/browse/SPARK-15933 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > Fix For: 2.0.0 > > > Currently, the DataFrameReader/Writer has method that are needed for > streaming and non-streaming DFs. This is quite awkward because each method in > them through runtime exception for one case or the other. So rather having > half the methods throw runtime exceptions, its just better to have a > different reader/writer API for streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15958) Make initial buffer size for the Sorter configurable
Sital Kedia created SPARK-15958: --- Summary: Make initial buffer size for the Sorter configurable Key: SPARK-15958 URL: https://issues.apache.org/jira/browse/SPARK-15958 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: Sital Kedia Currently the initial buffer size in the sorter is hard coded inside the code (https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java#L88) and is too small for large workload. As a result, the sorter spends significant time expanding the buffer size and copying the data. It would be useful to have it configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-15957: Assignee: (was: Yanbo Liang) > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang > > RFormula will index label only when it is string type currently. If the label > is numeric type and we use RFormula to present a classification model, there > is no label attributes in label column metadata. The label attributes are > useful when making prediction for classification, so we can force to index > label by {{StringIndexer}} whether it is numeric or string type for > classification. Then SparkR wrappers can extract label attributes from label > column metadata successfully. This feature can help us to fix bug similar > with SPARK-15153. > For regression, we will still to keep label as numeric type. > In this PR, we add a param indexLabel to control whether to force to index > label for RFormula. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-15957: Description: RFormula will index label only when it is string type currently. If the label is numeric type and we use RFormula to present a classification model, there is no label attributes in label column metadata. The label attributes are useful when making prediction for classification, so we can force to index label by {{StringIndexer}} whether it is numeric or string type for classification. Then SparkR wrappers can extract label attributes from label column metadata successfully. This feature can help us to fix bug similar with SPARK-15153. For regression, we will still to keep label as numeric type. In this PR, we add a param indexLabel to control whether to force to index label for RFormula. was: RFormula will index label only when it is string type. If the label is numeric type and we use RFormula to present a classification model, we can not extract label attributes from the label column metadata successfully. The label attributes are useful when make prediction for classification, so we can force to index label by {{StringIndexer}} whether it is numeric or string type for classification. Then SparkR wrappers can extract label attributes from the column metadata successfully. This feature can help us to fix bug similar with SPARK-15153. For regression, we will still to keep label as numeric type. We should add a param to control whether to force to index label for RFormula. > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Yanbo Liang > > RFormula will index label only when it is string type currently. If the label > is numeric type and we use RFormula to present a classification model, there > is no label attributes in label column metadata. The label attributes are > useful when making prediction for classification, so we can force to index > label by {{StringIndexer}} whether it is numeric or string type for > classification. Then SparkR wrappers can extract label attributes from label > column metadata successfully. This feature can help us to fix bug similar > with SPARK-15153. > For regression, we will still to keep label as numeric type. > In this PR, we add a param indexLabel to control whether to force to index > label for RFormula. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15956) When unwrapping ORC avoid pattern matching at runtime
[ https://issues.apache.org/jira/browse/SPARK-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15956: Assignee: Apache Spark > When unwrapping ORC avoid pattern matching at runtime > - > > Key: SPARK-15956 > URL: https://issues.apache.org/jira/browse/SPARK-15956 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Brian Cho >Assignee: Apache Spark >Priority: Minor > > When unwrapping ORC values, pattern matching for each data value at runtime > hurts performance. This should be avoided. > Instead, we can run pattern matching once and return a function that is > subsequently used to unwrap each data value. This is already implemented for > certain primitive types. We should implement for the remaining types, > including complex types (e.g, list, map). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15956) When unwrapping ORC avoid pattern matching at runtime
[ https://issues.apache.org/jira/browse/SPARK-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15956: Assignee: Apache Spark > When unwrapping ORC avoid pattern matching at runtime > - > > Key: SPARK-15956 > URL: https://issues.apache.org/jira/browse/SPARK-15956 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Brian Cho >Assignee: Apache Spark >Priority: Minor > > When unwrapping ORC values, pattern matching for each data value at runtime > hurts performance. This should be avoided. > Instead, we can run pattern matching once and return a function that is > subsequently used to unwrap each data value. This is already implemented for > certain primitive types. We should implement for the remaining types, > including complex types (e.g, list, map). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15956) When unwrapping ORC avoid pattern matching at runtime
[ https://issues.apache.org/jira/browse/SPARK-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15956: Assignee: (was: Apache Spark) > When unwrapping ORC avoid pattern matching at runtime > - > > Key: SPARK-15956 > URL: https://issues.apache.org/jira/browse/SPARK-15956 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Brian Cho >Priority: Minor > > When unwrapping ORC values, pattern matching for each data value at runtime > hurts performance. This should be avoided. > Instead, we can run pattern matching once and return a function that is > subsequently used to unwrap each data value. This is already implemented for > certain primitive types. We should implement for the remaining types, > including complex types (e.g, list, map). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15956) When unwrapping ORC avoid pattern matching at runtime
[ https://issues.apache.org/jira/browse/SPARK-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330946#comment-15330946 ] Apache Spark commented on SPARK-15956: -- User 'dafrista' has created a pull request for this issue: https://github.com/apache/spark/pull/13676 > When unwrapping ORC avoid pattern matching at runtime > - > > Key: SPARK-15956 > URL: https://issues.apache.org/jira/browse/SPARK-15956 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Brian Cho >Priority: Minor > > When unwrapping ORC values, pattern matching for each data value at runtime > hurts performance. This should be avoided. > Instead, we can run pattern matching once and return a function that is > subsequently used to unwrap each data value. This is already implemented for > certain primitive types. We should implement for the remaining types, > including complex types (e.g, list, map). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3451) spark-submit should support specifying glob wildcards in the --jars CLI option
[ https://issues.apache.org/jira/browse/SPARK-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330945#comment-15330945 ] Kun Liu commented on SPARK-3451: Vote for this feature. If no one will, I may try as my first time to contribute to the Spark community. > spark-submit should support specifying glob wildcards in the --jars CLI option > -- > > Key: SPARK-3451 > URL: https://issues.apache.org/jira/browse/SPARK-3451 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Affects Versions: 1.0.2 >Reporter: wolfgang hoschek >Priority: Minor > > spark-submit should support specifying glob wildcards in the --jars CLI > option, e.g. --jars /opt/myapp/*.jar > This would simplify usage for enterprise customers, for example in > combination with being able to specify --jars multiple times as described in > https://issues.apache.org/jira/browse/SPARK-3450, like so: > {code} > my-spark-submit.sh: > spark-submit --jars /opt/myapp/*.jar "$@" > {code} > Example usage: > {code} > my-spark-submit.sh --jars myUserDefinedFunction.jar > {code} > The relevant enhancement code might go into SparkSubmitArguments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-15957: Description: RFormula will index label only when it is string type. If the label is numeric type and we use RFormula to present a classification model, we can not extract label attributes from the label column metadata successfully. The label attributes are useful when make prediction for classification, so we can force to index label by {{StringIndexer}} whether it is numeric or string type for classification. Then SparkR wrappers can extract label attributes from the column metadata successfully. This feature can help us to fix bug similar with SPARK-15153. For regression, we will still to keep label as numeric type. We should add a param to control whether to force to index label for RFormula. was: RFormula will index label only when it is string type. If the label is numeric type and we use RFormula to present a classification model, we can not extract label attributes from the label column metadata successfully. The label attributes are useful when make prediction for classification, so we can force to index label by {StringIndexer} whether it is numeric or string type for classification. Then SparkR wrappers can extract label attributes from the column metadata successfully. This feature can help us to fix bug similar with SPARK-15153. For regression, we will still to keep label as numeric type. We should add a param to control whether to force to index label for RFormula. > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Yanbo Liang > > RFormula will index label only when it is string type. If the label is > numeric type and we use RFormula to present a classification model, we can > not extract label attributes from the label column metadata successfully. The > label attributes are useful when make prediction for classification, so we > can force to index label by {{StringIndexer}} whether it is numeric or string > type for classification. Then SparkR wrappers can extract label attributes > from the column metadata successfully. This feature can help us to fix bug > similar with SPARK-15153. > For regression, we will still to keep label as numeric type. > We should add a param to control whether to force to index label for RFormula. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-15957: Description: RFormula will index label only when it is string type. If the label is numeric type and we use RFormula to present a classification model, we can not extract label attributes from the label column metadata successfully. The label attributes are useful when make prediction for classification, so we can force to index label by {StringIndexer} whether it is numeric or string type for classification. Then SparkR wrappers can extract label attributes from the column metadata successfully. This feature can help us to fix bug similar with SPARK-15153. For regression, we will still to keep label as numeric type. We should add a param to control whether to force to index label for RFormula. was: RFormula will index label only when it is string type. If the label is numeric type and we use RFormula to present a classification model, we can not extract label attributes from the label column metadata successfully. The label attributes are useful, so we can force to index label whether it is numeric or string type for classification. Then SparkR wrappers can extract label attributes from the column metadata successfully. This feature can help us to fix bug similar with SPARK-15153. For regression, we will still to keep numeric type. We should add a param to control whether to force to index label for RFormula. > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Yanbo Liang > > RFormula will index label only when it is string type. If the label is > numeric type and we use RFormula to present a classification model, we can > not extract label attributes from the label column metadata successfully. The > label attributes are useful when make prediction for classification, so we > can force to index label by {StringIndexer} whether it is numeric or string > type for classification. Then SparkR wrappers can extract label attributes > from the column metadata successfully. This feature can help us to fix bug > similar with SPARK-15153. > For regression, we will still to keep label as numeric type. > We should add a param to control whether to force to index label for RFormula. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-15957: Description: RFormula will index label only when it is string type. If the label is numeric type and we use RFormula to present a classification model, we can not extract label attributes from the label column metadata successfully. The label attributes are useful, so we can force to index label whether it is numeric or string type for classification. Then SparkR wrappers can extract label attributes from the column metadata successfully. This feature can help us to fix bug similar with SPARK-15153. For regression, we will still to keep numeric type. We should add a param to control whether to force to index label for RFormula. was:Add param to make users can force to index label whether it is numeric or string. For classification algorithms, we force to index label by setting it with true. > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Yanbo Liang > > RFormula will index label only when it is string type. If the label is > numeric type and we use RFormula to present a classification model, we can > not extract label attributes from the label column metadata successfully. The > label attributes are useful, so we can force to index label whether it is > numeric or string type for classification. Then SparkR wrappers can extract > label attributes from the column metadata successfully. This feature can help > us to fix bug similar with SPARK-15153. > For regression, we will still to keep numeric type. > We should add a param to control whether to force to index label for RFormula. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15927) Eliminate redundant code in DAGScheduler's getParentStages and getAncestorShuffleDependencies methods.
[ https://issues.apache.org/jira/browse/SPARK-15927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout resolved SPARK-15927. Resolution: Fixed Fix Version/s: 2.1.0 Fixed by https://github.com/apache/spark/commit/5d50d4f0f9db3e6cc7c51e35cdb2d12daa4fd108 > Eliminate redundant code in DAGScheduler's getParentStages and > getAncestorShuffleDependencies methods. > -- > > Key: SPARK-15927 > URL: https://issues.apache.org/jira/browse/SPARK-15927 > Project: Spark > Issue Type: Sub-task > Components: Scheduler >Affects Versions: 2.0.0 >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout >Priority: Minor > Fix For: 2.1.0 > > > The getParentStages and getAncestorShuffleDependencies methods have a lot of > repeated code to traverse the dependency graph. We should create a function > that they can both call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang reassigned SPARK-15957: --- Assignee: Yanbo Liang (was: Apache Spark) > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Yanbo Liang > > Add param to make users can force to index label whether it is numeric or > string. For classification algorithms, we force to index label by setting it > with true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15957: Assignee: Apache Spark > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Apache Spark > > Add param to make users can force to index label whether it is numeric or > string. For classification algorithms, we force to index label by setting it > with true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15957: Assignee: (was: Apache Spark) > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang > > Add param to make users can force to index label whether it is numeric or > string. For classification algorithms, we force to index label by setting it > with true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15957: Assignee: Apache Spark > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Apache Spark > > Add param to make users can force to index label whether it is numeric or > string. For classification algorithms, we force to index label by setting it > with true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15957: Assignee: Apache Spark > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Apache Spark > > Add param to make users can force to index label whether it is numeric or > string. For classification algorithms, we force to index label by setting it > with true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330896#comment-15330896 ] Apache Spark commented on SPARK-15957: -- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/13675 > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang > > Add param to make users can force to index label whether it is numeric or > string. For classification algorithms, we force to index label by setting it > with true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15957) RFormula supports forcing to index label
[ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15957: Assignee: (was: Apache Spark) > RFormula supports forcing to index label > > > Key: SPARK-15957 > URL: https://issues.apache.org/jira/browse/SPARK-15957 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang > > Add param to make users can force to index label whether it is numeric or > string. For classification algorithms, we force to index label by setting it > with true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15957) RFormula supports forcing to index label
Yanbo Liang created SPARK-15957: --- Summary: RFormula supports forcing to index label Key: SPARK-15957 URL: https://issues.apache.org/jira/browse/SPARK-15957 Project: Spark Issue Type: Improvement Components: ML Reporter: Yanbo Liang Add param to make users can force to index label whether it is numeric or string. For classification algorithms, we force to index label by setting it with true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15956) When unwrapping ORC avoid pattern matching at runtime
Brian Cho created SPARK-15956: - Summary: When unwrapping ORC avoid pattern matching at runtime Key: SPARK-15956 URL: https://issues.apache.org/jira/browse/SPARK-15956 Project: Spark Issue Type: Improvement Components: SQL Reporter: Brian Cho Priority: Minor When unwrapping ORC values, pattern matching for each data value at runtime hurts performance. This should be avoided. Instead, we can run pattern matching once and return a function that is subsequently used to unwrap each data value. This is already implemented for certain primitive types. We should implement for the remaining types, including complex types (e.g, list, map). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15954) TestHive has issues being used in PySpark
[ https://issues.apache.org/jira/browse/SPARK-15954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15954: Assignee: Apache Spark > TestHive has issues being used in PySpark > - > > Key: SPARK-15954 > URL: https://issues.apache.org/jira/browse/SPARK-15954 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: holdenk >Assignee: Apache Spark > > SPARK-15745 made TestHive unreliable from PySpark test cases, to support it > we should allow both resource or system property based lookup for loading the > hive file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15954) TestHive has issues being used in PySpark
[ https://issues.apache.org/jira/browse/SPARK-15954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330822#comment-15330822 ] Apache Spark commented on SPARK-15954: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/12938 > TestHive has issues being used in PySpark > - > > Key: SPARK-15954 > URL: https://issues.apache.org/jira/browse/SPARK-15954 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: holdenk > > SPARK-15745 made TestHive unreliable from PySpark test cases, to support it > we should allow both resource or system property based lookup for loading the > hive file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15954) TestHive has issues being used in PySpark
[ https://issues.apache.org/jira/browse/SPARK-15954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15954: Assignee: (was: Apache Spark) > TestHive has issues being used in PySpark > - > > Key: SPARK-15954 > URL: https://issues.apache.org/jira/browse/SPARK-15954 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: holdenk > > SPARK-15745 made TestHive unreliable from PySpark test cases, to support it > we should allow both resource or system property based lookup for loading the > hive file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15955) Failed Spark application returns with exitcode equals to zero
[ https://issues.apache.org/jira/browse/SPARK-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated SPARK-15955: --- Summary: Failed Spark application returns with exitcode equals to zero (was: Failed Spark application returns with client console equals zero) > Failed Spark application returns with exitcode equals to zero > - > > Key: SPARK-15955 > URL: https://issues.apache.org/jira/browse/SPARK-15955 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Yesha Vora > > Scenario: > * Set up cluster with wire-encryption enabled. > * set 'spark.authenticate.enableSaslEncryption' = 'false' and > 'spark.shuffle.service.enabled' :'true' > * run sparkPi application. > {code} > client token: Token { kind: YARN_CLIENT_TOKEN, service: } > diagnostics: Max number of executor failures (3) reached > ApplicationMaster host: xx.xx.xx.xxx > ApplicationMaster RPC port: 0 > queue: default > start time: 1465941051976 > final status: FAILED > tracking URL: https://xx.xx.xx.xxx:8090/proxy/application_1465925772890_0016/ > user: hrt_qa > Exception in thread "main" org.apache.spark.SparkException: Application > application_1465925772890_0016 finished with failed status > at org.apache.spark.deploy.yarn.Client.run(Client.scala:1092) > at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1139) > at org.apache.spark.deploy.yarn.Client.main(Client.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > INFO ShutdownHookManager: Shutdown hook called{code} > This spark application exits with exitcode = 0. Failed application should not > return with exitcode = 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15954) TestHive has issues being used in PySpark
[ https://issues.apache.org/jira/browse/SPARK-15954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330821#comment-15330821 ] holdenk commented on SPARK-15954: - See related PR https://github.com/apache/spark/pull/12938 > TestHive has issues being used in PySpark > - > > Key: SPARK-15954 > URL: https://issues.apache.org/jira/browse/SPARK-15954 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: holdenk > > SPARK-15745 made TestHive unreliable from PySpark test cases, to support it > we should allow both resource or system property based lookup for loading the > hive file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15955) Failed Spark application returns with client console equals zero
Yesha Vora created SPARK-15955: -- Summary: Failed Spark application returns with client console equals zero Key: SPARK-15955 URL: https://issues.apache.org/jira/browse/SPARK-15955 Project: Spark Issue Type: Bug Affects Versions: 1.6.1 Reporter: Yesha Vora Scenario: * Set up cluster with wire-encryption enabled. * set 'spark.authenticate.enableSaslEncryption' = 'false' and 'spark.shuffle.service.enabled' :'true' * run sparkPi application. {code} client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: Max number of executor failures (3) reached ApplicationMaster host: xx.xx.xx.xxx ApplicationMaster RPC port: 0 queue: default start time: 1465941051976 final status: FAILED tracking URL: https://xx.xx.xx.xxx:8090/proxy/application_1465925772890_0016/ user: hrt_qa Exception in thread "main" org.apache.spark.SparkException: Application application_1465925772890_0016 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1092) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1139) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) INFO ShutdownHookManager: Shutdown hook called{code} This spark application exits with exitcode = 0. Failed application should not return with exitcode = 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15954) TestHive has issues being used in PySpark
holdenk created SPARK-15954: --- Summary: TestHive has issues being used in PySpark Key: SPARK-15954 URL: https://issues.apache.org/jira/browse/SPARK-15954 Project: Spark Issue Type: Bug Components: PySpark, SQL Reporter: holdenk SPARK-15745 made TestHive unreliable from PySpark test cases, to support it we should allow both resource or system property based lookup for loading the hive file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15953) Renamed ContinuousQuery to StreamingQuery for simplicity
[ https://issues.apache.org/jira/browse/SPARK-15953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15953: Assignee: Apache Spark (was: Tathagata Das) > Renamed ContinuousQuery to StreamingQuery for simplicity > > > Key: SPARK-15953 > URL: https://issues.apache.org/jira/browse/SPARK-15953 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Apache Spark > > Make the API more intuitive by removing the term "Continuous". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15953) Renamed ContinuousQuery to StreamingQuery for simplicity
[ https://issues.apache.org/jira/browse/SPARK-15953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15953: Assignee: Tathagata Das (was: Apache Spark) > Renamed ContinuousQuery to StreamingQuery for simplicity > > > Key: SPARK-15953 > URL: https://issues.apache.org/jira/browse/SPARK-15953 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > Make the API more intuitive by removing the term "Continuous". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15953) Renamed ContinuousQuery to StreamingQuery for simplicity
[ https://issues.apache.org/jira/browse/SPARK-15953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330805#comment-15330805 ] Apache Spark commented on SPARK-15953: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/13673 > Renamed ContinuousQuery to StreamingQuery for simplicity > > > Key: SPARK-15953 > URL: https://issues.apache.org/jira/browse/SPARK-15953 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > Make the API more intuitive by removing the term "Continuous". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15953) Renamed ContinuousQuery to StreamingQuery for simplicity
[ https://issues.apache.org/jira/browse/SPARK-15953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15953: Assignee: Apache Spark (was: Tathagata Das) > Renamed ContinuousQuery to StreamingQuery for simplicity > > > Key: SPARK-15953 > URL: https://issues.apache.org/jira/browse/SPARK-15953 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Apache Spark > > Make the API more intuitive by removing the term "Continuous". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15953) Renamed ContinuousQuery to StreamingQuery for simplicity
[ https://issues.apache.org/jira/browse/SPARK-15953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15953: Assignee: Tathagata Das (was: Apache Spark) > Renamed ContinuousQuery to StreamingQuery for simplicity > > > Key: SPARK-15953 > URL: https://issues.apache.org/jira/browse/SPARK-15953 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > Make the API more intuitive by removing the term "Continuous". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15953) Renamed ContinuousQuery to StreamingQuery for simplicity
[ https://issues.apache.org/jira/browse/SPARK-15953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-15953: -- Target Version/s: 2.0.0 > Renamed ContinuousQuery to StreamingQuery for simplicity > > > Key: SPARK-15953 > URL: https://issues.apache.org/jira/browse/SPARK-15953 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > Make the API more intuitive by removing the term "Continuous". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15953) Renamed ContinuousQuery to StreamingQuery for simplicity
Tathagata Das created SPARK-15953: - Summary: Renamed ContinuousQuery to StreamingQuery for simplicity Key: SPARK-15953 URL: https://issues.apache.org/jira/browse/SPARK-15953 Project: Spark Issue Type: Sub-task Components: SQL, Streaming Reporter: Tathagata Das Assignee: Tathagata Das Make the API more intuitive by removing the term "Continuous". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15933) Refactor reader-writer interface for streaming DFs to use DataStreamReader/Writer
[ https://issues.apache.org/jira/browse/SPARK-15933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-15933: -- Issue Type: Sub-task (was: Bug) Parent: SPARK-8360 > Refactor reader-writer interface for streaming DFs to use > DataStreamReader/Writer > - > > Key: SPARK-15933 > URL: https://issues.apache.org/jira/browse/SPARK-15933 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > Currently, the DataFrameReader/Writer has method that are needed for > streaming and non-streaming DFs. This is quite awkward because each method in > them through runtime exception for one case or the other. So rather having > half the methods throw runtime exceptions, its just better to have a > different reader/writer API for streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15046) When running hive-thriftserver with yarn on a secure cluster the workers fail with java.lang.NumberFormatException
[ https://issues.apache.org/jira/browse/SPARK-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330723#comment-15330723 ] Apache Spark commented on SPARK-15046: -- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/13669 > When running hive-thriftserver with yarn on a secure cluster the workers fail > with java.lang.NumberFormatException > -- > > Key: SPARK-15046 > URL: https://issues.apache.org/jira/browse/SPARK-15046 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.0.0 >Reporter: Trystan Leftwich >Priority: Blocker > > When running hive-thriftserver with yarn on a secure cluster > (spark.yarn.principal and spark.yarn.keytab are set) the workers fail with > the following error. > {code} > 16/04/30 22:40:50 ERROR yarn.ApplicationMaster: Uncaught exception: > java.lang.NumberFormatException: For input string: "86400079ms" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:441) > at java.lang.Long.parseLong(Long.java:483) > at > scala.collection.immutable.StringLike$class.toLong(StringLike.scala:276) > at scala.collection.immutable.StringOps.toLong(StringOps.scala:29) > at > org.apache.spark.SparkConf$$anonfun$getLong$2.apply(SparkConf.scala:380) > at > org.apache.spark.SparkConf$$anonfun$getLong$2.apply(SparkConf.scala:380) > at scala.Option.map(Option.scala:146) > at org.apache.spark.SparkConf.getLong(SparkConf.scala:380) > at > org.apache.spark.deploy.SparkHadoopUtil.getTimeFromNowToRenewal(SparkHadoopUtil.scala:289) > at > org.apache.spark.deploy.yarn.AMDelegationTokenRenewer.org$apache$spark$deploy$yarn$AMDelegationTokenRenewer$$scheduleRenewal$1(AMDelegationTokenRenewer.scala:89) > at > org.apache.spark.deploy.yarn.AMDelegationTokenRenewer.scheduleLoginFromKeytab(AMDelegationTokenRenewer.scala:121) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$3.apply(ApplicationMaster.scala:243) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$3.apply(ApplicationMaster.scala:243) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:243) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:723) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) > at > org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:721) > at > org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:748) > at > org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15741) PySpark Cleanup of _setDefault with seed=None
[ https://issues.apache.org/jira/browse/SPARK-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15741: Assignee: (was: Apache Spark) > PySpark Cleanup of _setDefault with seed=None > - > > Key: SPARK-15741 > URL: https://issues.apache.org/jira/browse/SPARK-15741 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Bryan Cutler >Priority: Minor > > Several places in PySpark ML have Params._setDefault with a seed param equal > to {{None}}. This is unnecessary as it will translate to a {{0}} even though > the param has a fixed value based by on the hashed classname by default. > Currently, the ALS doc test output depends on this happening and would be > more clear and stable if it was explicitly set to {{0}}. These should be > cleaned up for stability and consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15741) PySpark Cleanup of _setDefault with seed=None
[ https://issues.apache.org/jira/browse/SPARK-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15741: Assignee: Apache Spark > PySpark Cleanup of _setDefault with seed=None > - > > Key: SPARK-15741 > URL: https://issues.apache.org/jira/browse/SPARK-15741 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Bryan Cutler >Assignee: Apache Spark >Priority: Minor > > Several places in PySpark ML have Params._setDefault with a seed param equal > to {{None}}. This is unnecessary as it will translate to a {{0}} even though > the param has a fixed value based by on the hashed classname by default. > Currently, the ALS doc test output depends on this happening and would be > more clear and stable if it was explicitly set to {{0}}. These should be > cleaned up for stability and consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15741) PySpark Cleanup of _setDefault with seed=None
[ https://issues.apache.org/jira/browse/SPARK-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330696#comment-15330696 ] Apache Spark commented on SPARK-15741: -- User 'BryanCutler' has created a pull request for this issue: https://github.com/apache/spark/pull/13672 > PySpark Cleanup of _setDefault with seed=None > - > > Key: SPARK-15741 > URL: https://issues.apache.org/jira/browse/SPARK-15741 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Bryan Cutler >Priority: Minor > > Several places in PySpark ML have Params._setDefault with a seed param equal > to {{None}}. This is unnecessary as it will translate to a {{0}} even though > the param has a fixed value based by on the hashed classname by default. > Currently, the ALS doc test output depends on this happening and would be > more clear and stable if it was explicitly set to {{0}}. These should be > cleaned up for stability and consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330675#comment-15330675 ] Mark Grover commented on SPARK-12177: - bq. I can rename it to spark-streaming-kafka-0-10 to match the change made for the 0.8 consumer Thanks! Mark have you (or anyone else) actually tried this PR out using TLS? bq. No, I haven't, sorry. > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15952) "show databases" does not get sorted result
[ https://issues.apache.org/jira/browse/SPARK-15952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15952: Assignee: (was: Apache Spark) > "show databases" does not get sorted result > --- > > Key: SPARK-15952 > URL: https://issues.apache.org/jira/browse/SPARK-15952 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng > > Two issues I've found for "show databases" commands: > 1. The returned database name list was not sorted, it only works when "like" > was used together; (HIVE will always return a sorted list) > 2. When it is used as sql("show databases").show, it will output a table with > column named as "result", but for sql("show tables").show, it will output the > column name as "tableName", so I think we should be consistent and use > "databaseName" at least. > I will make a PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15952) "show databases" does not get sorted result
[ https://issues.apache.org/jira/browse/SPARK-15952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330663#comment-15330663 ] Apache Spark commented on SPARK-15952: -- User 'bomeng' has created a pull request for this issue: https://github.com/apache/spark/pull/13671 > "show databases" does not get sorted result > --- > > Key: SPARK-15952 > URL: https://issues.apache.org/jira/browse/SPARK-15952 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng > > Two issues I've found for "show databases" commands: > 1. The returned database name list was not sorted, it only works when "like" > was used together; (HIVE will always return a sorted list) > 2. When it is used as sql("show databases").show, it will output a table with > column named as "result", but for sql("show tables").show, it will output the > column name as "tableName", so I think we should be consistent and use > "databaseName" at least. > I will make a PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15952) "show databases" does not get sorted result
[ https://issues.apache.org/jira/browse/SPARK-15952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15952: Assignee: Apache Spark > "show databases" does not get sorted result > --- > > Key: SPARK-15952 > URL: https://issues.apache.org/jira/browse/SPARK-15952 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Assignee: Apache Spark > > Two issues I've found for "show databases" commands: > 1. The returned database name list was not sorted, it only works when "like" > was used together; (HIVE will always return a sorted list) > 2. When it is used as sql("show databases").show, it will output a table with > column named as "result", but for sql("show tables").show, it will output the > column name as "tableName", so I think we should be consistent and use > "databaseName" at least. > I will make a PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15741) PySpark Cleanup of _setDefault with seed=None
[ https://issues.apache.org/jira/browse/SPARK-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-15741: - Description: Several places in PySpark ML have Params._setDefault with a seed param equal to {{None}}. This is unnecessary as it will translate to a {{0}} even though the param has a fixed value based by on the hashed classname by default. Currently, the ALS doc test output depends on this happening and would be more clear and stable if it was explicitly set to {{0}}. These should be cleaned up for stability and consistency. (was: Calling Params._setDefault with a param equal to {{None}} will be ignored internally silently. There are several cases where this is done with the {{seed}} param, making it seem like it might do something. These cases should be removed for the sake of consistency.) > PySpark Cleanup of _setDefault with seed=None > - > > Key: SPARK-15741 > URL: https://issues.apache.org/jira/browse/SPARK-15741 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Bryan Cutler >Priority: Minor > > Several places in PySpark ML have Params._setDefault with a seed param equal > to {{None}}. This is unnecessary as it will translate to a {{0}} even though > the param has a fixed value based by on the hashed classname by default. > Currently, the ALS doc test output depends on this happening and would be > more clear and stable if it was explicitly set to {{0}}. These should be > cleaned up for stability and consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-15741) PySpark Cleanup of _setDefault with seed=None
[ https://issues.apache.org/jira/browse/SPARK-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reopened SPARK-15741: -- Reopened as I feel this still should be cleaned up. > PySpark Cleanup of _setDefault with seed=None > - > > Key: SPARK-15741 > URL: https://issues.apache.org/jira/browse/SPARK-15741 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Bryan Cutler >Priority: Minor > > Several places in PySpark ML have Params._setDefault with a seed param equal > to {{None}}. This is unnecessary as it will translate to a {{0}} even though > the param has a fixed value based by on the hashed classname by default. > Currently, the ALS doc test output depends on this happening and would be > more clear and stable if it was explicitly set to {{0}}. These should be > cleaned up for stability and consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15952) "show databases" does not get sorted result
Bo Meng created SPARK-15952: --- Summary: "show databases" does not get sorted result Key: SPARK-15952 URL: https://issues.apache.org/jira/browse/SPARK-15952 Project: Spark Issue Type: Bug Components: SQL Reporter: Bo Meng Two issues I've found for "show databases" commands: 1. The returned database name list was not sorted, it only works when "like" was used together; (HIVE will always return a sorted list) 2. When it is used as sql("show databases").show, it will output a table with column named as "result", but for sql("show tables").show, it will output the column name as "tableName", so I think we should be consistent and use "databaseName" at least. I will make a PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-15892) Incorrectly merged AFTAggregator with zero total count
[ https://issues.apache.org/jira/browse/SPARK-15892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley reopened SPARK-15892: --- > Incorrectly merged AFTAggregator with zero total count > -- > > Key: SPARK-15892 > URL: https://issues.apache.org/jira/browse/SPARK-15892 > Project: Spark > Issue Type: Bug > Components: Examples, ML, PySpark >Affects Versions: 1.6.1, 2.0.0 >Reporter: Joseph K. Bradley >Assignee: Hyukjin Kwon > Fix For: 2.0.0 > > > Running the example (after the fix in > [https://github.com/apache/spark/pull/13393]) causes this failure: > {code} > Traceback (most recent call last): > > File > "/Users/josephkb/spark/examples/src/main/python/ml/aft_survival_regression.py", > line 49, in > model = aft.fit(training) > File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/base.py", > line 64, in fit > File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", > line 213, in _fit > File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", > line 210, in _fit_java > File > "/Users/josephkb/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 933, in __call__ > File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", > line 79, in deco > pyspark.sql.utils.IllegalArgumentException: u'requirement failed: The number > of instances should be greater than 0.0, but got 0.' > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15892) Incorrectly merged AFTAggregator with zero total count
[ https://issues.apache.org/jira/browse/SPARK-15892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-15892: -- Fix Version/s: (was: 1.6.2) > Incorrectly merged AFTAggregator with zero total count > -- > > Key: SPARK-15892 > URL: https://issues.apache.org/jira/browse/SPARK-15892 > Project: Spark > Issue Type: Bug > Components: Examples, ML, PySpark >Affects Versions: 1.6.1, 2.0.0 >Reporter: Joseph K. Bradley >Assignee: Hyukjin Kwon > Fix For: 2.0.0 > > > Running the example (after the fix in > [https://github.com/apache/spark/pull/13393]) causes this failure: > {code} > Traceback (most recent call last): > > File > "/Users/josephkb/spark/examples/src/main/python/ml/aft_survival_regression.py", > line 49, in > model = aft.fit(training) > File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/base.py", > line 64, in fit > File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", > line 213, in _fit > File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", > line 210, in _fit_java > File > "/Users/josephkb/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 933, in __call__ > File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", > line 79, in deco > pyspark.sql.utils.IllegalArgumentException: u'requirement failed: The number > of instances should be greater than 0.0, but got 0.' > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15951) Change Executors Page to use datatables to support sorting columns and searching
[ https://issues.apache.org/jira/browse/SPARK-15951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15951: Assignee: Apache Spark > Change Executors Page to use datatables to support sorting columns and > searching > > > Key: SPARK-15951 > URL: https://issues.apache.org/jira/browse/SPARK-15951 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Kishor Patil >Assignee: Apache Spark > Fix For: 2.1.0 > > > Support column sort and search for Executors Server using jQuery DataTable > and REST API. Before this commit, the executors page was generated hard-coded > html and can not support search, also, the sorting was disabled if there is > any application that has more than one attempt. Supporting search and sort > (over all applications rather than the 20 entries in the current page) in any > case will greatly improve user experience. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15951) Change Executors Page to use datatables to support sorting columns and searching
[ https://issues.apache.org/jira/browse/SPARK-15951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330629#comment-15330629 ] Apache Spark commented on SPARK-15951: -- User 'kishorvpatil' has created a pull request for this issue: https://github.com/apache/spark/pull/13670 > Change Executors Page to use datatables to support sorting columns and > searching > > > Key: SPARK-15951 > URL: https://issues.apache.org/jira/browse/SPARK-15951 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Kishor Patil > Fix For: 2.1.0 > > > Support column sort and search for Executors Server using jQuery DataTable > and REST API. Before this commit, the executors page was generated hard-coded > html and can not support search, also, the sorting was disabled if there is > any application that has more than one attempt. Supporting search and sort > (over all applications rather than the 20 entries in the current page) in any > case will greatly improve user experience. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15951) Change Executors Page to use datatables to support sorting columns and searching
[ https://issues.apache.org/jira/browse/SPARK-15951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15951: Assignee: (was: Apache Spark) > Change Executors Page to use datatables to support sorting columns and > searching > > > Key: SPARK-15951 > URL: https://issues.apache.org/jira/browse/SPARK-15951 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Kishor Patil > Fix For: 2.1.0 > > > Support column sort and search for Executors Server using jQuery DataTable > and REST API. Before this commit, the executors page was generated hard-coded > html and can not support search, also, the sorting was disabled if there is > any application that has more than one attempt. Supporting search and sort > (over all applications rather than the 20 entries in the current page) in any > case will greatly improve user experience. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330552#comment-15330552 ] Cody Koeninger commented on SPARK-12177: I can rename it to spark-streaming-kafka-0-10 to match the change made for the 0.8 consumer Mark have you (or anyone else) actually tried this PR out using TLS? > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13928) Move org.apache.spark.Logging into org.apache.spark.internal.Logging
[ https://issues.apache.org/jira/browse/SPARK-13928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330550#comment-15330550 ] Russell Alexander Spitzer commented on SPARK-13928: --- So users(like me ;) ) need to write their own Logging trait now? I'm a little confused based on the description. > Move org.apache.spark.Logging into org.apache.spark.internal.Logging > > > Key: SPARK-13928 > URL: https://issues.apache.org/jira/browse/SPARK-13928 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Reynold Xin >Assignee: Wenchen Fan > Fix For: 2.0.0 > > > Logging was made private in Spark 2.0. If we move it, then users would be > able to create a Logging trait themselves to avoid changing their own code. > Alternatively, we can also provide in a compatibility package that adds > logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15905) Driver hung while writing to console progress bar
[ https://issues.apache.org/jira/browse/SPARK-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330529#comment-15330529 ] Shixiong Zhu commented on SPARK-15905: -- Did you happen to block the stdout or stderr? Such as the disk is full and log4j cannot flush logs to the disk? > Driver hung while writing to console progress bar > - > > Key: SPARK-15905 > URL: https://issues.apache.org/jira/browse/SPARK-15905 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Tejas Patil >Priority: Minor > > This leads to driver being not able to get heartbeats from its executors and > job being stuck. After looking at the locking dependency amongst the driver > threads per the jstack, this is where the driver seems to be stuck. > {noformat} > "refresh progress" #113 daemon prio=5 os_prio=0 tid=0x7f7986cbc800 > nid=0x7887d runnable [0x7f6d3507a000] >java.lang.Thread.State: RUNNABLE > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:326) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > - locked <0x7f6eb81dd290> (a java.io.BufferedOutputStream) > at java.io.PrintStream.write(PrintStream.java:482) >- locked <0x7f6eb81dd258> (a java.io.PrintStream) > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) > at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) > at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) > - locked <0x7f6eb81dd400> (a java.io.OutputStreamWriter) > at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) > at java.io.PrintStream.write(PrintStream.java:527) > - locked <0x7f6eb81dd258> (a java.io.PrintStream) > at java.io.PrintStream.print(PrintStream.java:669) > at > org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:99) > at > org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:69) > - locked <0x7f6ed33b48a0> (a > org.apache.spark.ui.ConsoleProgressBar) > at > org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:53) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15716) Memory usage of driver keeps growing up in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330526#comment-15330526 ] Yan Chen commented on SPARK-15716: -- The checkpoint path is on HDFS. The application is already shutdown, will try to get jstack output next time. > Memory usage of driver keeps growing up in Spark Streaming > -- > > Key: SPARK-15716 > URL: https://issues.apache.org/jira/browse/SPARK-15716 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.4.1 > Environment: Oracle Java 1.8.0_51, 1.8.0_85, 1.8.0_91 and 1.8.0_92 > SUSE Linux, CentOS 6 and CentOS 7 >Reporter: Yan Chen > Original Estimate: 48h > Remaining Estimate: 48h > > Code: > {code:java} > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.spark.SparkConf; > import org.apache.spark.SparkContext; > import org.apache.spark.streaming.Durations; > import org.apache.spark.streaming.StreamingContext; > import org.apache.spark.streaming.api.java.JavaPairDStream; > import org.apache.spark.streaming.api.java.JavaStreamingContext; > import org.apache.spark.streaming.api.java.JavaStreamingContextFactory; > public class App { > public static void main(String[] args) { > final String input = args[0]; > final String check = args[1]; > final long interval = Long.parseLong(args[2]); > final SparkConf conf = new SparkConf(); > conf.set("spark.streaming.minRememberDuration", "180s"); > conf.set("spark.streaming.receiver.writeAheadLog.enable", "true"); > conf.set("spark.streaming.unpersist", "true"); > conf.set("spark.streaming.ui.retainedBatches", "10"); > conf.set("spark.ui.retainedJobs", "10"); > conf.set("spark.ui.retainedStages", "10"); > conf.set("spark.worker.ui.retainedExecutors", "10"); > conf.set("spark.worker.ui.retainedDrivers", "10"); > conf.set("spark.sql.ui.retainedExecutions", "10"); > JavaStreamingContextFactory jscf = () -> { > SparkContext sc = new SparkContext(conf); > sc.setCheckpointDir(check); > StreamingContext ssc = new StreamingContext(sc, > Durations.milliseconds(interval)); > JavaStreamingContext jssc = new JavaStreamingContext(ssc); > jssc.checkpoint(check); > // setup pipeline here > JavaPairDStreaminputStream = > jssc.fileStream( > input, > LongWritable.class, > Text.class, > TextInputFormat.class, > (filepath) -> Boolean.TRUE, > false > ); > JavaPairDStream usbk = inputStream > .updateStateByKey((current, state) -> state); > usbk.checkpoint(Durations.seconds(10)); > usbk.foreachRDD(rdd -> { > rdd.count(); > System.out.println("usbk: " + rdd.toDebugString().split("\n").length); > return null; > }); > return jssc; > }; > JavaStreamingContext jssc = JavaStreamingContext.getOrCreate(check, jscf); > jssc.start(); > jssc.awaitTermination(); > } > } > {code} > Command used to run the code > {code:none} > spark-submit --keytab [keytab] --principal [principal] --class [package].App > --master yarn --driver-memory 1g --executor-memory 1G --conf > "spark.driver.maxResultSize=0" --conf "spark.logConf=true" --conf > "spark.executor.instances=2" --conf > "spark.executor.extraJavaOptions=-XX:+PrintFlagsFinal -XX:+PrintReferenceGC > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions" --conf > "spark.driver.extraJavaOptions=-Xloggc:/[dir]/memory-gc.log > -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy > -XX:+UnlockDiagnosticVMOptions" [jar-file-path] file:///[dir-on-nas-drive] > [dir-on-hdfs] 200 > {code} > It's a very simple piece of code, when I ran it, the memory usage of driver > keeps going up. There is no file input in our runs. Batch interval is set to > 200 milliseconds; processing time for each batch is below 150 milliseconds, > while most of which are below 70 milliseconds. > !http://i.imgur.com/uSzUui6.png! > The right most four red triangles are full GC's which are triggered manually > by using "jcmd pid GC.run" command. > I also did more experiments in the second and third comment I posted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15951) Change Executors Page to use datatables to support sorting columns and searching
Kishor Patil created SPARK-15951: Summary: Change Executors Page to use datatables to support sorting columns and searching Key: SPARK-15951 URL: https://issues.apache.org/jira/browse/SPARK-15951 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.0.0 Reporter: Kishor Patil Fix For: 2.1.0 Support column sort and search for Executors Server using jQuery DataTable and REST API. Before this commit, the executors page was generated hard-coded html and can not support search, also, the sorting was disabled if there is any application that has more than one attempt. Supporting search and sort (over all applications rather than the 20 entries in the current page) in any case will greatly improve user experience. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15888) Python UDF over aggregate fails
[ https://issues.apache.org/jira/browse/SPARK-15888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-15888: --- Priority: Blocker (was: Major) > Python UDF over aggregate fails > --- > > Key: SPARK-15888 > URL: https://issues.apache.org/jira/browse/SPARK-15888 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Vladimir Feinberg >Priority: Blocker > > This looks like a regression from 1.6.1. > The following notebook runs without error in a Spark 1.6.1 cluster, but fails > in 2.0.0: > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6001574963454425/3194562079278586/1653464426712019/latest.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15888) Python UDF over aggregate fails
[ https://issues.apache.org/jira/browse/SPARK-15888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-15888: -- Assignee: Davies Liu > Python UDF over aggregate fails > --- > > Key: SPARK-15888 > URL: https://issues.apache.org/jira/browse/SPARK-15888 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Vladimir Feinberg >Assignee: Davies Liu >Priority: Blocker > > This looks like a regression from 1.6.1. > The following notebook runs without error in a Spark 1.6.1 cluster, but fails > in 2.0.0: > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6001574963454425/3194562079278586/1653464426712019/latest.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330500#comment-15330500 ] Mark Grover commented on SPARK-12177: - bq. It's worth mentioning that authentication is also supported via TLS. I am aware of a number of people who are using TLS for both authentication and encryption. So, the security benefit is available now for some people, at least. Fair point, thanks. Ok, so what remains to get this in? 1. The PR (https://github.com/apache/spark/pull/11863) reviewed by me, so it probably needs to be reviewed by a committer. 2. Sorry for sounding like a broken record, but I don't think kafka-beta as the name for the subproject makes much sense, especially now that the new consumer api in Kafka 0.10 is not beta. So, some committer buy in would be more valuable there too. Anything else? > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15716) Memory usage of driver keeps growing up in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330497#comment-15330497 ] Shixiong Zhu commented on SPARK-15716: -- I saw there were a lot of "org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler". Where's the checkpoint path? Looks like the checkpoint writer is pretty slow. Could you also provide the jstack output? > Memory usage of driver keeps growing up in Spark Streaming > -- > > Key: SPARK-15716 > URL: https://issues.apache.org/jira/browse/SPARK-15716 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.4.1 > Environment: Oracle Java 1.8.0_51, 1.8.0_85, 1.8.0_91 and 1.8.0_92 > SUSE Linux, CentOS 6 and CentOS 7 >Reporter: Yan Chen > Original Estimate: 48h > Remaining Estimate: 48h > > Code: > {code:java} > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.spark.SparkConf; > import org.apache.spark.SparkContext; > import org.apache.spark.streaming.Durations; > import org.apache.spark.streaming.StreamingContext; > import org.apache.spark.streaming.api.java.JavaPairDStream; > import org.apache.spark.streaming.api.java.JavaStreamingContext; > import org.apache.spark.streaming.api.java.JavaStreamingContextFactory; > public class App { > public static void main(String[] args) { > final String input = args[0]; > final String check = args[1]; > final long interval = Long.parseLong(args[2]); > final SparkConf conf = new SparkConf(); > conf.set("spark.streaming.minRememberDuration", "180s"); > conf.set("spark.streaming.receiver.writeAheadLog.enable", "true"); > conf.set("spark.streaming.unpersist", "true"); > conf.set("spark.streaming.ui.retainedBatches", "10"); > conf.set("spark.ui.retainedJobs", "10"); > conf.set("spark.ui.retainedStages", "10"); > conf.set("spark.worker.ui.retainedExecutors", "10"); > conf.set("spark.worker.ui.retainedDrivers", "10"); > conf.set("spark.sql.ui.retainedExecutions", "10"); > JavaStreamingContextFactory jscf = () -> { > SparkContext sc = new SparkContext(conf); > sc.setCheckpointDir(check); > StreamingContext ssc = new StreamingContext(sc, > Durations.milliseconds(interval)); > JavaStreamingContext jssc = new JavaStreamingContext(ssc); > jssc.checkpoint(check); > // setup pipeline here > JavaPairDStreaminputStream = > jssc.fileStream( > input, > LongWritable.class, > Text.class, > TextInputFormat.class, > (filepath) -> Boolean.TRUE, > false > ); > JavaPairDStream usbk = inputStream > .updateStateByKey((current, state) -> state); > usbk.checkpoint(Durations.seconds(10)); > usbk.foreachRDD(rdd -> { > rdd.count(); > System.out.println("usbk: " + rdd.toDebugString().split("\n").length); > return null; > }); > return jssc; > }; > JavaStreamingContext jssc = JavaStreamingContext.getOrCreate(check, jscf); > jssc.start(); > jssc.awaitTermination(); > } > } > {code} > Command used to run the code > {code:none} > spark-submit --keytab [keytab] --principal [principal] --class [package].App > --master yarn --driver-memory 1g --executor-memory 1G --conf > "spark.driver.maxResultSize=0" --conf "spark.logConf=true" --conf > "spark.executor.instances=2" --conf > "spark.executor.extraJavaOptions=-XX:+PrintFlagsFinal -XX:+PrintReferenceGC > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions" --conf > "spark.driver.extraJavaOptions=-Xloggc:/[dir]/memory-gc.log > -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy > -XX:+UnlockDiagnosticVMOptions" [jar-file-path] file:///[dir-on-nas-drive] > [dir-on-hdfs] 200 > {code} > It's a very simple piece of code, when I ran it, the memory usage of driver > keeps going up. There is no file input in our runs. Batch interval is set to > 200 milliseconds; processing time for each batch is below 150 milliseconds, > while most of which are below 70 milliseconds. > !http://i.imgur.com/uSzUui6.png! > The right most four red triangles are full GC's which are triggered manually > by using "jcmd pid GC.run" command. > I also did more experiments in the second and third comment I posted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
[jira] [Resolved] (SPARK-15247) sqlCtx.read.parquet yields at least n_executors * n_cores tasks
[ https://issues.apache.org/jira/browse/SPARK-15247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-15247. -- Resolution: Fixed Fix Version/s: 2.0.0 This issue has been resolved by https://github.com/apache/spark/pull/13137. > sqlCtx.read.parquet yields at least n_executors * n_cores tasks > --- > > Key: SPARK-15247 > URL: https://issues.apache.org/jira/browse/SPARK-15247 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Johnny W. >Assignee: Takeshi Yamamuro > Fix For: 2.0.0 > > > sqlCtx.read.parquet always yields at least n_executors * n_cores tasks, even > though this is only 1 very small file > This issue can increase the latency for small jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15247) sqlCtx.read.parquet yields at least n_executors * n_cores tasks
[ https://issues.apache.org/jira/browse/SPARK-15247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-15247: - Assignee: Takeshi Yamamuro > sqlCtx.read.parquet yields at least n_executors * n_cores tasks > --- > > Key: SPARK-15247 > URL: https://issues.apache.org/jira/browse/SPARK-15247 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Johnny W. >Assignee: Takeshi Yamamuro > > sqlCtx.read.parquet always yields at least n_executors * n_cores tasks, even > though this is only 1 very small file > This issue can increase the latency for small jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15905) Driver hung while writing to console progress bar
[ https://issues.apache.org/jira/browse/SPARK-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330482#comment-15330482 ] Tejas Patil edited comment on SPARK-15905 at 6/14/16 8:12 PM: -- [~zsxwing] >> Do you have the whole jstack output? I will not be able to share it as is .. but then looking at the entire 7k lines of jstack file and removing stuff like ip address or any company internal stuff seems to be lot of work to me. >> Could you check you disk? Maybe some bad disks cause the hang. At the time this happened, I did not notice any problems with disk on the box. However, will keep an eye about that next time. >> By the way, how did you use Spark? Did you just run it or call it via some >> Process APIs? We run spark jobs directly via spark-shell was (Author: tejasp): @zsxwing >> Do you have the whole jstack output? I will not be able to share it as is .. but then looking at the entire 7k lines of jstack file and removing stuff like ip address or any company internal stuff seems to be lot of work to me. >> Could you check you disk? Maybe some bad disks cause the hang. At the time this happened, I did not notice any problems with disk on the box. However, will keep an eye about that next time. >> By the way, how did you use Spark? Did you just run it or call it via some >> Process APIs? We run spark jobs directly via spark-shell > Driver hung while writing to console progress bar > - > > Key: SPARK-15905 > URL: https://issues.apache.org/jira/browse/SPARK-15905 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Tejas Patil >Priority: Minor > > This leads to driver being not able to get heartbeats from its executors and > job being stuck. After looking at the locking dependency amongst the driver > threads per the jstack, this is where the driver seems to be stuck. > {noformat} > "refresh progress" #113 daemon prio=5 os_prio=0 tid=0x7f7986cbc800 > nid=0x7887d runnable [0x7f6d3507a000] >java.lang.Thread.State: RUNNABLE > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:326) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > - locked <0x7f6eb81dd290> (a java.io.BufferedOutputStream) > at java.io.PrintStream.write(PrintStream.java:482) >- locked <0x7f6eb81dd258> (a java.io.PrintStream) > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) > at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) > at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) > - locked <0x7f6eb81dd400> (a java.io.OutputStreamWriter) > at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) > at java.io.PrintStream.write(PrintStream.java:527) > - locked <0x7f6eb81dd258> (a java.io.PrintStream) > at java.io.PrintStream.print(PrintStream.java:669) > at > org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:99) > at > org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:69) > - locked <0x7f6ed33b48a0> (a > org.apache.spark.ui.ConsoleProgressBar) > at > org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:53) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15905) Driver hung while writing to console progress bar
[ https://issues.apache.org/jira/browse/SPARK-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330482#comment-15330482 ] Tejas Patil commented on SPARK-15905: - @zsxwing >> Do you have the whole jstack output? I will not be able to share it as is .. but then looking at the entire 7k lines of jstack file and removing stuff like ip address or any company internal stuff seems to be lot of work to me. >> Could you check you disk? Maybe some bad disks cause the hang. At the time this happened, I did not notice any problems with disk on the box. However, will keep an eye about that next time. >> By the way, how did you use Spark? Did you just run it or call it via some >> Process APIs? We run spark jobs directly via spark-shell > Driver hung while writing to console progress bar > - > > Key: SPARK-15905 > URL: https://issues.apache.org/jira/browse/SPARK-15905 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Tejas Patil >Priority: Minor > > This leads to driver being not able to get heartbeats from its executors and > job being stuck. After looking at the locking dependency amongst the driver > threads per the jstack, this is where the driver seems to be stuck. > {noformat} > "refresh progress" #113 daemon prio=5 os_prio=0 tid=0x7f7986cbc800 > nid=0x7887d runnable [0x7f6d3507a000] >java.lang.Thread.State: RUNNABLE > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:326) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > - locked <0x7f6eb81dd290> (a java.io.BufferedOutputStream) > at java.io.PrintStream.write(PrintStream.java:482) >- locked <0x7f6eb81dd258> (a java.io.PrintStream) > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) > at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) > at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) > - locked <0x7f6eb81dd400> (a java.io.OutputStreamWriter) > at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) > at java.io.PrintStream.write(PrintStream.java:527) > - locked <0x7f6eb81dd258> (a java.io.PrintStream) > at java.io.PrintStream.print(PrintStream.java:669) > at > org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:99) > at > org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:69) > - locked <0x7f6ed33b48a0> (a > org.apache.spark.ui.ConsoleProgressBar) > at > org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:53) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15904) High Memory Pressure using MLlib K-means
[ https://issues.apache.org/jira/browse/SPARK-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330480#comment-15330480 ] Sean Owen commented on SPARK-15904: --- I don't think the problem is your code. You're allocating on the one hand too little memory (OOME), and on the other hand too much (swapping). > High Memory Pressure using MLlib K-means > > > Key: SPARK-15904 > URL: https://issues.apache.org/jira/browse/SPARK-15904 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.1 > Environment: Mac OS X 10.11.6beta on Macbook Pro 13" mid-2012. 16GB > of RAM. >Reporter: Alessio >Priority: Minor > > Running MLlib K-Means on a ~400MB dataset (12 partitions), persisted on > Memory and Disk. > Everything's fine, although at the end of K-Means, after the number of > iterations, the cost function value and the running time there's a nice > "Removing RDD from persistent list" stage. However, during this stage > there's a high memory pressure. Weird, since RDDs are about to be removed. > Full log of this stage: > 16/06/12 20:37:33 INFO clustering.KMeans: Run 0 finished in 14 iterations > 16/06/12 20:37:33 INFO clustering.KMeans: Iterations took 694.544 seconds. > 16/06/12 20:37:33 INFO clustering.KMeans: KMeans converged in 14 iterations. > 16/06/12 20:37:33 INFO clustering.KMeans: The cost for the best run is > 49784.87126751288. > 16/06/12 20:37:33 INFO rdd.MapPartitionsRDD: Removing RDD 781 from > persistence list > 16/06/12 20:37:33 INFO storage.BlockManager: Removing RDD 781 > 16/06/12 20:37:33 INFO rdd.MapPartitionsRDD: Removing RDD 780 from > persistence list > 16/06/12 20:37:33 INFO storage.BlockManager: Removing RDD 780 > I'm running this K-Means on a 16GB machine, with Spark Context as local[*]. > My machine has an i5 hyperthreaded dual-core, thus [*] means 4. > I'm launching this application though spark-submit with --driver-memory 9G -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15849) FileNotFoundException on _temporary while doing saveAsTable to S3
[ https://issues.apache.org/jira/browse/SPARK-15849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330240#comment-15330240 ] Thomas Demoor commented on SPARK-15849: --- Forgot to mention, it also speeds up your writes 2x > FileNotFoundException on _temporary while doing saveAsTable to S3 > - > > Key: SPARK-15849 > URL: https://issues.apache.org/jira/browse/SPARK-15849 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: AWS EC2 with spark on yarn and s3 storage >Reporter: Sandeep > > When submitting spark jobs to yarn cluster, I occasionally see these error > messages while doing saveAsTable. I have tried doing this with > spark.speculation=false, and get the same error. These errors are similar to > SPARK-2984, but my jobs are writing to S3(s3n) : > Caused by: java.io.FileNotFoundException: File > s3n://xxx/_temporary/0/task_201606080516_0004_m_79 does not exist. > at > org.apache.hadoop.fs.s3native.NativeS3FileSystem.listStatus(NativeS3FileSystem.java:506) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310) > at > org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151) > ... 42 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15849) FileNotFoundException on _temporary while doing saveAsTable to S3
[ https://issues.apache.org/jira/browse/SPARK-15849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330238#comment-15330238 ] Thomas Demoor commented on SPARK-15849: --- Seems like typical list-after-write inconsistency. However, you can avoid this issue. With S3, you should use a direct committer instead of the standard Hadoop ones. Googling for DirectParquetOutputCommitter should help you along. There is no reason to have the "write to _temporary and atomically rename to final version" as S3 can handle concurrent writers. We are working to get this behaviour directly into Hadoop (HADOOP-9565). > FileNotFoundException on _temporary while doing saveAsTable to S3 > - > > Key: SPARK-15849 > URL: https://issues.apache.org/jira/browse/SPARK-15849 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: AWS EC2 with spark on yarn and s3 storage >Reporter: Sandeep > > When submitting spark jobs to yarn cluster, I occasionally see these error > messages while doing saveAsTable. I have tried doing this with > spark.speculation=false, and get the same error. These errors are similar to > SPARK-2984, but my jobs are writing to S3(s3n) : > Caused by: java.io.FileNotFoundException: File > s3n://xxx/_temporary/0/task_201606080516_0004_m_79 does not exist. > at > org.apache.hadoop.fs.s3native.NativeS3FileSystem.listStatus(NativeS3FileSystem.java:506) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310) > at > org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151) > ... 42 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15895) _common_metadata and _metadata appearing in the inner partitioning dirs of a partitioned parquet datasets break partitioning discovery
[ https://issues.apache.org/jira/browse/SPARK-15895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-15895. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13623 [https://github.com/apache/spark/pull/13623] > _common_metadata and _metadata appearing in the inner partitioning dirs of a > partitioned parquet datasets break partitioning discovery > -- > > Key: SPARK-15895 > URL: https://issues.apache.org/jira/browse/SPARK-15895 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Assignee: Cheng Lian > Fix For: 2.0.0 > > > see > https://issues.apache.org/jira/browse/SPARK-13207?focusedCommentId=15305703=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15305703 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15716) Memory usage of driver keeps growing up in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Chen updated SPARK-15716: - Affects Version/s: (was: 1.6.1) (was: 1.6.0) (was: 1.5.0) (was: 2.0.0) > Memory usage of driver keeps growing up in Spark Streaming > -- > > Key: SPARK-15716 > URL: https://issues.apache.org/jira/browse/SPARK-15716 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.4.1 > Environment: Oracle Java 1.8.0_51, 1.8.0_85, 1.8.0_91 and 1.8.0_92 > SUSE Linux, CentOS 6 and CentOS 7 >Reporter: Yan Chen > Original Estimate: 48h > Remaining Estimate: 48h > > Code: > {code:java} > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.spark.SparkConf; > import org.apache.spark.SparkContext; > import org.apache.spark.streaming.Durations; > import org.apache.spark.streaming.StreamingContext; > import org.apache.spark.streaming.api.java.JavaPairDStream; > import org.apache.spark.streaming.api.java.JavaStreamingContext; > import org.apache.spark.streaming.api.java.JavaStreamingContextFactory; > public class App { > public static void main(String[] args) { > final String input = args[0]; > final String check = args[1]; > final long interval = Long.parseLong(args[2]); > final SparkConf conf = new SparkConf(); > conf.set("spark.streaming.minRememberDuration", "180s"); > conf.set("spark.streaming.receiver.writeAheadLog.enable", "true"); > conf.set("spark.streaming.unpersist", "true"); > conf.set("spark.streaming.ui.retainedBatches", "10"); > conf.set("spark.ui.retainedJobs", "10"); > conf.set("spark.ui.retainedStages", "10"); > conf.set("spark.worker.ui.retainedExecutors", "10"); > conf.set("spark.worker.ui.retainedDrivers", "10"); > conf.set("spark.sql.ui.retainedExecutions", "10"); > JavaStreamingContextFactory jscf = () -> { > SparkContext sc = new SparkContext(conf); > sc.setCheckpointDir(check); > StreamingContext ssc = new StreamingContext(sc, > Durations.milliseconds(interval)); > JavaStreamingContext jssc = new JavaStreamingContext(ssc); > jssc.checkpoint(check); > // setup pipeline here > JavaPairDStreaminputStream = > jssc.fileStream( > input, > LongWritable.class, > Text.class, > TextInputFormat.class, > (filepath) -> Boolean.TRUE, > false > ); > JavaPairDStream usbk = inputStream > .updateStateByKey((current, state) -> state); > usbk.checkpoint(Durations.seconds(10)); > usbk.foreachRDD(rdd -> { > rdd.count(); > System.out.println("usbk: " + rdd.toDebugString().split("\n").length); > return null; > }); > return jssc; > }; > JavaStreamingContext jssc = JavaStreamingContext.getOrCreate(check, jscf); > jssc.start(); > jssc.awaitTermination(); > } > } > {code} > Command used to run the code > {code:none} > spark-submit --keytab [keytab] --principal [principal] --class [package].App > --master yarn --driver-memory 1g --executor-memory 1G --conf > "spark.driver.maxResultSize=0" --conf "spark.logConf=true" --conf > "spark.executor.instances=2" --conf > "spark.executor.extraJavaOptions=-XX:+PrintFlagsFinal -XX:+PrintReferenceGC > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions" --conf > "spark.driver.extraJavaOptions=-Xloggc:/[dir]/memory-gc.log > -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy > -XX:+UnlockDiagnosticVMOptions" [jar-file-path] file:///[dir-on-nas-drive] > [dir-on-hdfs] 200 > {code} > It's a very simple piece of code, when I ran it, the memory usage of driver > keeps going up. There is no file input in our runs. Batch interval is set to > 200 milliseconds; processing time for each batch is below 150 milliseconds, > while most of which are below 70 milliseconds. > !http://i.imgur.com/uSzUui6.png! > The right most four red triangles are full GC's which are triggered manually > by using "jcmd pid GC.run" command. > I also did more experiments in the second and third comment I posted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15915) CacheManager should use canonicalized plan for planToCache.
[ https://issues.apache.org/jira/browse/SPARK-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330175#comment-15330175 ] Apache Spark commented on SPARK-15915: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/13668 > CacheManager should use canonicalized plan for planToCache. > --- > > Key: SPARK-15915 > URL: https://issues.apache.org/jira/browse/SPARK-15915 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 2.0.0 > > > {{DataFrame}} with plan overriding {{sameResult}} but not using canonicalized > plan to compare can't cacheTable. > The example is like: > {code} > val localRelation = Seq(1, 2, 3).toDF() > localRelation.createOrReplaceTempView("localRelation") > spark.catalog.cacheTable("localRelation") > assert( > localRelation.queryExecution.withCachedData.collect { > case i: InMemoryRelation => i > }.size == 1) > {code} > and this will fail as: > {noformat} > ArrayBuffer() had size 0 instead of expected size 1 > {noformat} > The reason is that when do {{spark.catalog.cacheTable("localRelation")}}, > {{CacheManager}} tries to cache for the plan wrapped by {{SubqueryAlias}} but > when planning for the DataFrame {{localRelation}}, {{CacheManager}} tries to > find cached table for the not-wrapped plan because the plan for DataFrame > {{localRelation}} is not wrapped. > Some plans like {{LocalRelation}}, {{LogicalRDD}}, etc. override > {{sameResult}} method, but not use canonicalized plan to compare so the > {{CacheManager}} can't detect the plans are the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15934) Return binary mode in ThriftServer
[ https://issues.apache.org/jira/browse/SPARK-15934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330168#comment-15330168 ] Apache Spark commented on SPARK-15934: -- User 'epahomov' has created a pull request for this issue: https://github.com/apache/spark/pull/13667 > Return binary mode in ThriftServer > -- > > Key: SPARK-15934 > URL: https://issues.apache.org/jira/browse/SPARK-15934 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Egor Pahomov >Priority: Critical > > In spark-2.0.0 preview binary mode was turned off (SPARK-15095). > It was greatly irresponsible step due to the fact, that in 1.6.1 binary mode > was default and it turned off in 2.0.0. > Just to describe magnitude of harm not fixing this bug would do in my > organization: > * Tableau works only though Thrift Server and only with binary format. > Tableau would not work with spark-2.0.0 at all! > * I have bunch of analysts in my organization with configured sql > clients(DataGrip and Squirrel). I would need to go one by one to change > connection string for them(DataGrip). Squirrel simply do not work with http - > some jar hell in my case. > * let me not mention all other stuff which connects to our data > infrastructure through ThriftServer as gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15767) Decision Tree Regression wrapper in SparkR
[ https://issues.apache.org/jira/browse/SPARK-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330166#comment-15330166 ] Kai Jiang commented on SPARK-15767: --- Thanks [~shivaram]! Yes, as you said, we should shorter name of the function. I will open the PR later. > Decision Tree Regression wrapper in SparkR > -- > > Key: SPARK-15767 > URL: https://issues.apache.org/jira/browse/SPARK-15767 > Project: Spark > Issue Type: New Feature > Components: ML, SparkR >Reporter: Kai Jiang >Assignee: Kai Jiang > > Implement a wrapper in SparkR to support decision tree regression. R's naive > Decision Tree Regression implementation is from package rpart with signature > rpart(formula, dataframe, method="anova"). I propose we could implement API > like spark.decisionTreeRegression(dataframe, formula, ...) . After having > implemented decision tree classification, we could refactor this two into an > API more like rpart() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15864) Inconsistent Behaviors when Uncaching Non-cached Tables
[ https://issues.apache.org/jira/browse/SPARK-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15864: Assignee: Xiao Li > Inconsistent Behaviors when Uncaching Non-cached Tables > --- > > Key: SPARK-15864 > URL: https://issues.apache.org/jira/browse/SPARK-15864 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.0 > > > To uncache a table, we have two different APIs. > {{UNCACHE TABLE}} or {{spark.catalog.uncacheTable}} > When the table is not cached, the first way will report nothing. However, the > second way will report a strange error message: > {{requirement failed: Table [a: int] is not cached}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org