[jira] [Updated] (SPARK-12974) Add Python API for spark.ml bisecting k-means
[ https://issues.apache.org/jira/browse/SPARK-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-12974: Component/s: PySpark ML > Add Python API for spark.ml bisecting k-means > - > > Key: SPARK-12974 > URL: https://issues.apache.org/jira/browse/SPARK-12974 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Yanbo Liang >Priority: Minor > > Add Python API for spark.ml bisecting k-means -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12974) Add Python API for spark.ml bisecting k-means
[ https://issues.apache.org/jira/browse/SPARK-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12974: Assignee: (was: Apache Spark) > Add Python API for spark.ml bisecting k-means > - > > Key: SPARK-12974 > URL: https://issues.apache.org/jira/browse/SPARK-12974 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Yanbo Liang >Priority: Minor > > Add Python API for spark.ml bisecting k-means -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12974) Add Python API for spark.ml bisecting k-means
[ https://issues.apache.org/jira/browse/SPARK-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114258#comment-15114258 ] Apache Spark commented on SPARK-12974: -- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/10889 > Add Python API for spark.ml bisecting k-means > - > > Key: SPARK-12974 > URL: https://issues.apache.org/jira/browse/SPARK-12974 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Yanbo Liang >Priority: Minor > > Add Python API for spark.ml bisecting k-means -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12974) Add Python API for spark.ml bisecting k-means
[ https://issues.apache.org/jira/browse/SPARK-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12974: Assignee: Apache Spark > Add Python API for spark.ml bisecting k-means > - > > Key: SPARK-12974 > URL: https://issues.apache.org/jira/browse/SPARK-12974 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Yanbo Liang >Assignee: Apache Spark >Priority: Minor > > Add Python API for spark.ml bisecting k-means -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12973) Support to set priority when submit spark application to YARN
[ https://issues.apache.org/jira/browse/SPARK-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12973: Assignee: (was: Apache Spark) > Support to set priority when submit spark application to YARN > - > > Key: SPARK-12973 > URL: https://issues.apache.org/jira/browse/SPARK-12973 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.6.1 >Reporter: Chaozhong Yang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12973) Support to set priority when submit spark application to YARN
[ https://issues.apache.org/jira/browse/SPARK-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12973: Assignee: Apache Spark > Support to set priority when submit spark application to YARN > - > > Key: SPARK-12973 > URL: https://issues.apache.org/jira/browse/SPARK-12973 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.6.1 >Reporter: Chaozhong Yang >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12973) Support to set priority when submit spark application to YARN
[ https://issues.apache.org/jira/browse/SPARK-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114250#comment-15114250 ] Apache Spark commented on SPARK-12973: -- User 'debugger87' has created a pull request for this issue: https://github.com/apache/spark/pull/10888 > Support to set priority when submit spark application to YARN > - > > Key: SPARK-12973 > URL: https://issues.apache.org/jira/browse/SPARK-12973 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.6.1 >Reporter: Chaozhong Yang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12974) Add Python API for spark.ml bisecting k-means
Yanbo Liang created SPARK-12974: --- Summary: Add Python API for spark.ml bisecting k-means Key: SPARK-12974 URL: https://issues.apache.org/jira/browse/SPARK-12974 Project: Spark Issue Type: Improvement Reporter: Yanbo Liang Priority: Minor Add Python API for spark.ml bisecting k-means -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior
[ https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114721#comment-15114721 ] Yin Huai commented on SPARK-9740: - [~emlyn] You can use {{expr}} function provided by {{org.apache.spark.sql.functions}} to do that. For example, {{expr("first(colName, true)")}}. > first/last aggregate NULL behavior > -- > > Key: SPARK-9740 > URL: https://issues.apache.org/jira/browse/SPARK-9740 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Herman van Hovell >Assignee: Yin Huai > Labels: releasenotes > Fix For: 1.6.0 > > > The FIRST/LAST aggregates implemented as part of the new UDAF interface, > return the first or last non-null value (if any) found. This is a departure > from the behavior of the old FIRST/LAST aggregates and from the > FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, > if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' > this behavior for the old UDAF interface. > Hive makes this behavior configurable, by adding a skipNulls flag. I would > suggest to do the same, and make the default behavior compatible with Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set
[ https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114722#comment-15114722 ] Jack Hu commented on SPARK-6847: Test on latest 1.6 branch (f913f7e [SPARK-12120][PYSPARK] Improve exception message when failing to init), it still exists. > Stack overflow on updateStateByKey which followed by a dstream with > checkpoint set > -- > > Key: SPARK-6847 > URL: https://issues.apache.org/jira/browse/SPARK-6847 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0 >Reporter: Jack Hu > Labels: StackOverflowError, Streaming > > The issue happens with the following sample code: uses {{updateStateByKey}} > followed by a {{map}} with checkpoint interval 10 seconds > {code} > val sparkConf = new SparkConf().setAppName("test") > val streamingContext = new StreamingContext(sparkConf, Seconds(10)) > streamingContext.checkpoint("""checkpoint""") > val source = streamingContext.socketTextStream("localhost", ) > val updatedResult = source.map( > (1,_)).updateStateByKey( > (newlist : Seq[String], oldstate : Option[String]) => > newlist.headOption.orElse(oldstate)) > updatedResult.map(_._2) > .checkpoint(Seconds(10)) > .foreachRDD((rdd, t) => { > println("Deep: " + rdd.toDebugString.split("\n").length) > println(t.toString() + ": " + rdd.collect.length) > }) > streamingContext.start() > streamingContext.awaitTermination() > {code} > From the output, we can see that the dependency will be increasing time over > time, the {{updateStateByKey}} never get check-pointed, and finally, the > stack overflow will happen. > Note: > * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but > not the {{updateStateByKey}} > * If remove the {{checkpoint(Seconds(10))}} from the map result ( > {{updatedResult.map(_._2)}} ), the stack overflow will not happen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12975) Eliminate Bucketing Columns that are part of Partitioning Columns
Xiao Li created SPARK-12975: --- Summary: Eliminate Bucketing Columns that are part of Partitioning Columns Key: SPARK-12975 URL: https://issues.apache.org/jira/browse/SPARK-12975 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.0 Reporter: Xiao Li When users are using partitionBy and bucketBy at the same time, some bucketing columns might be part of partitioning columns. For example, {code} df.write .format(source) .partitionBy("i") .bucketBy(8, "i", "k") .sortBy("k") .saveAsTable("bucketed_table") {code} However, in the above case, adding column `i` is useless. It is just wasting extra CPU when reading or writing bucket tables. Thus, we can automatically remove these overlapping columns from bucketing columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12976) Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.
Takuya Ueshin created SPARK-12976: - Summary: Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange. Key: SPARK-12976 URL: https://issues.apache.org/jira/browse/SPARK-12976 Project: Spark Issue Type: Improvement Components: SQL Reporter: Takuya Ueshin Add LazilyGenerateOrdering to support generated ordering for RangePartitioner of Exchange instead of InterpretedOrdering. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12901) Refector options to be correctly formed in a case class
[ https://issues.apache.org/jira/browse/SPARK-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12901: Assignee: (was: Apache Spark) > Refector options to be correctly formed in a case class > --- > > Key: SPARK-12901 > URL: https://issues.apache.org/jira/browse/SPARK-12901 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > The {{CSVParameters}} class is a case class but looks more like a normal > class. > This might be refactored similar with {{JSONOptions}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12901) Refector options to be correctly formed in a case class
[ https://issues.apache.org/jira/browse/SPARK-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12901: Assignee: Apache Spark > Refector options to be correctly formed in a case class > --- > > Key: SPARK-12901 > URL: https://issues.apache.org/jira/browse/SPARK-12901 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Minor > > The {{CSVParameters}} class is a case class but looks more like a normal > class. > This might be refactored similar with {{JSONOptions}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12901) Refector options to be correctly formed in a case class
[ https://issues.apache.org/jira/browse/SPARK-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114761#comment-15114761 ] Apache Spark commented on SPARK-12901: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/10895 > Refector options to be correctly formed in a case class > --- > > Key: SPARK-12901 > URL: https://issues.apache.org/jira/browse/SPARK-12901 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > The {{CSVParameters}} class is a case class but looks more like a normal > class. > This might be refactored similar with {{JSONOptions}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12917) Add DML support to Spark SQL for HIVE
[ https://issues.apache.org/jira/browse/SPARK-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114616#comment-15114616 ] Hemang Nagar commented on SPARK-12917: -- Update and Delete operations are supported in Hive 0.14 and after that, we need Spark to support it. Also, need Insert by values operations to be supported. For example, insert into table values(1, "john doe"), this gives an unsupported operation exception in Spark. > Add DML support to Spark SQL for HIVE > - > > Key: SPARK-12917 > URL: https://issues.apache.org/jira/browse/SPARK-12917 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.6.0 >Reporter: Hemang Nagar >Priority: Blocker > > Spark SQL should be updated to support the DML operations that are being > supported by Hive since 0.14 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12976) Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.
[ https://issues.apache.org/jira/browse/SPARK-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114745#comment-15114745 ] Apache Spark commented on SPARK-12976: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/10894 > Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange. > --- > > Key: SPARK-12976 > URL: https://issues.apache.org/jira/browse/SPARK-12976 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Takuya Ueshin > > Add LazilyGenerateOrdering to support generated ordering for RangePartitioner > of Exchange instead of InterpretedOrdering. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12976) Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.
[ https://issues.apache.org/jira/browse/SPARK-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12976: Assignee: Apache Spark > Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange. > --- > > Key: SPARK-12976 > URL: https://issues.apache.org/jira/browse/SPARK-12976 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Takuya Ueshin >Assignee: Apache Spark > > Add LazilyGenerateOrdering to support generated ordering for RangePartitioner > of Exchange instead of InterpretedOrdering. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12976) Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.
[ https://issues.apache.org/jira/browse/SPARK-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12976: Assignee: (was: Apache Spark) > Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange. > --- > > Key: SPARK-12976 > URL: https://issues.apache.org/jira/browse/SPARK-12976 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Takuya Ueshin > > Add LazilyGenerateOrdering to support generated ordering for RangePartitioner > of Exchange instead of InterpretedOrdering. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12946) The SQL page is empty
[ https://issues.apache.org/jira/browse/SPARK-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114650#comment-15114650 ] KaiXinXIaoLei commented on SPARK-12946: --- I use the default config. In local mode, the problem exists, too. The way to test: use code of the master branch to build, then run "bin/spark-sq" , and run "create table a(i int); ". Finally, check the SQL page in http://IP:4040. I find the page is empty. > The SQL page is empty > - > > Key: SPARK-12946 > URL: https://issues.apache.org/jira/browse/SPARK-12946 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: KaiXinXIaoLei > Attachments: SQLpage.png > > > I run sql query using "bin/spark-sql --master yarn". Then i open the ui , > and find the SQL page is empty -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12975) Eliminate Bucketing Columns that are part of Partitioning Columns
[ https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-12975: Description: When users are using partitionBy and bucketBy at the same time, some bucketing columns might be part of partitioning columns. For example, {code} df.write .format(source) .partitionBy("i") .bucketBy(8, "i", "k") .sortBy("k") .saveAsTable("bucketed_table") {code} However, in the above case, adding column `i` is useless. It is just wasting extra CPU when reading or writing bucket tables. Thus, we can automatically remove these overlapping columns from the bucketing columns. was: When users are using partitionBy and bucketBy at the same time, some bucketing columns might be part of partitioning columns. For example, {code} df.write .format(source) .partitionBy("i") .bucketBy(8, "i", "k") .sortBy("k") .saveAsTable("bucketed_table") {code} However, in the above case, adding column `i` is useless. It is just wasting extra CPU when reading or writing bucket tables. Thus, we can automatically remove these overlapping columns from bucketing columns. > Eliminate Bucketing Columns that are part of Partitioning Columns > - > > Key: SPARK-12975 > URL: https://issues.apache.org/jira/browse/SPARK-12975 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > When users are using partitionBy and bucketBy at the same time, some > bucketing columns might be part of partitioning columns. For example, > {code} > df.write > .format(source) > .partitionBy("i") > .bucketBy(8, "i", "k") > .sortBy("k") > .saveAsTable("bucketed_table") > {code} > However, in the above case, adding column `i` is useless. It is just wasting > extra CPU when reading or writing bucket tables. Thus, we can automatically > remove these overlapping columns from the bucketing columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10911) Executors should System.exit on clean shutdown
[ https://issues.apache.org/jira/browse/SPARK-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114717#comment-15114717 ] Yin Huai commented on SPARK-10911: -- Quote or provide the link of new comments about this issue? > Executors should System.exit on clean shutdown > -- > > Key: SPARK-10911 > URL: https://issues.apache.org/jira/browse/SPARK-10911 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Zhuo Liu >Priority: Minor > > Executors should call System.exit on clean shutdown to make sure all user > threads exit and jvm shuts down. > We ran into a case where an Executor was left around for days trying to > shutdown because the user code was using a non-daemon thread pool and one of > those threads wasn't exiting. We should force the jvm to go away with > System.exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2004) Automate QA of Spark Build/Deploy Matrix
[ https://issues.apache.org/jira/browse/SPARK-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-2004. --- Resolution: Later Resolving this for now since I think this JIRA isn't actionable right now and it's kind of broad / vague. We can re-open when we have more concrete plans. > Automate QA of Spark Build/Deploy Matrix > > > Key: SPARK-2004 > URL: https://issues.apache.org/jira/browse/SPARK-2004 > Project: Spark > Issue Type: Umbrella > Components: Build, Deploy, Project Infra >Reporter: Xiangrui Meng >Assignee: Nicholas Chammas > > This is an umbrella JIRA to track QA automation tasks. Spark supports > * several deploy modes > ** local > ** standalone > ** yarn > ** mesos > * three languages > ** scala > ** java > ** python > * several hadoop versions > ** 0.x > ** 1.x > ** 2.x > * job submission from different systems > ** linux > ** mac os x > ** windows > The cross product of them creates a big deployment matrix. QA automation is > really necessary to avoid regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2005) Investigate linux container-based solution
[ https://issues.apache.org/jira/browse/SPARK-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-2005. --- Resolution: Later Resolving as "later". > Investigate linux container-based solution > -- > > Key: SPARK-2005 > URL: https://issues.apache.org/jira/browse/SPARK-2005 > Project: Spark > Issue Type: Sub-task > Components: Build >Reporter: Xiangrui Meng > > We can set up container-based cluster environment and automatically test > against a deployment matrix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12948) Consider reducing size of broadcasts in OrcRelation
[ https://issues.apache.org/jira/browse/SPARK-12948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated SPARK-12948: - Attachment: SPARK-12948.mem.prof.snapshot.png > Consider reducing size of broadcasts in OrcRelation > --- > > Key: SPARK-12948 > URL: https://issues.apache.org/jira/browse/SPARK-12948 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Rajesh Balamohan > Attachments: SPARK-12948.mem.prof.snapshot.png, > SPARK-12948_cpuProf.png > > > Size of broadcasted data in OrcRelation was significantly higher when running > query with large number of partitions (e.g TPC-DS). Consider reducing the > size of the broadcasted data in OrcRelation, as it has an impact on the job > runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12975) Eliminate Bucketing Columns that are part of Partitioning Columns
[ https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12975: Assignee: Apache Spark > Eliminate Bucketing Columns that are part of Partitioning Columns > - > > Key: SPARK-12975 > URL: https://issues.apache.org/jira/browse/SPARK-12975 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Apache Spark > > When users are using partitionBy and bucketBy at the same time, some > bucketing columns might be part of partitioning columns. For example, > {code} > df.write > .format(source) > .partitionBy("i") > .bucketBy(8, "i", "k") > .sortBy("k") > .saveAsTable("bucketed_table") > {code} > However, in the above case, adding column `i` is useless. It is just wasting > extra CPU when reading or writing bucket tables. Thus, we can automatically > remove these overlapping columns from the bucketing columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12975) Eliminate Bucketing Columns that are part of Partitioning Columns
[ https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12975: Assignee: (was: Apache Spark) > Eliminate Bucketing Columns that are part of Partitioning Columns > - > > Key: SPARK-12975 > URL: https://issues.apache.org/jira/browse/SPARK-12975 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > When users are using partitionBy and bucketBy at the same time, some > bucketing columns might be part of partitioning columns. For example, > {code} > df.write > .format(source) > .partitionBy("i") > .bucketBy(8, "i", "k") > .sortBy("k") > .saveAsTable("bucketed_table") > {code} > However, in the above case, adding column `i` is useless. It is just wasting > extra CPU when reading or writing bucket tables. Thus, we can automatically > remove these overlapping columns from the bucketing columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12975) Eliminate Bucketing Columns that are part of Partitioning Columns
[ https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114651#comment-15114651 ] Apache Spark commented on SPARK-12975: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/10891 > Eliminate Bucketing Columns that are part of Partitioning Columns > - > > Key: SPARK-12975 > URL: https://issues.apache.org/jira/browse/SPARK-12975 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > When users are using partitionBy and bucketBy at the same time, some > bucketing columns might be part of partitioning columns. For example, > {code} > df.write > .format(source) > .partitionBy("i") > .bucketBy(8, "i", "k") > .sortBy("k") > .saveAsTable("bucketed_table") > {code} > However, in the above case, adding column `i` is useless. It is just wasting > extra CPU when reading or writing bucket tables. Thus, we can automatically > remove these overlapping columns from the bucketing columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5175) bug in updating counters when starting multiple workers/supervisors in actor-based receiver
[ https://issues.apache.org/jira/browse/SPARK-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114676#comment-15114676 ] Apache Spark commented on SPARK-5175: - User 'CodingCat' has created a pull request for this issue: https://github.com/apache/spark/pull/10892 > bug in updating counters when starting multiple workers/supervisors in > actor-based receiver > --- > > Key: SPARK-5175 > URL: https://issues.apache.org/jira/browse/SPARK-5175 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.2.0 >Reporter: Nan Zhu > > when starting multiple workers(ActorReceiver.scala), we didn't update the > counters in it -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5174) Missing Document for starting multiple workers/supervisors in actor-based receiver
[ https://issues.apache.org/jira/browse/SPARK-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114675#comment-15114675 ] Apache Spark commented on SPARK-5174: - User 'CodingCat' has created a pull request for this issue: https://github.com/apache/spark/pull/10892 > Missing Document for starting multiple workers/supervisors in actor-based > receiver > -- > > Key: SPARK-5174 > URL: https://issues.apache.org/jira/browse/SPARK-5174 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.2.0 >Reporter: Nan Zhu >Priority: Minor > > Currently, the document about starting multiple supervisors/workers are > missing, though the implementation provides this capacity > {code:title=ActorReceiver.scala|borderStyle=solid} > case props: Props => > val worker = context.actorOf(props) > logInfo("Started receiver worker at:" + worker.path) > sender ! worker > case (props: Props, name: String) => > val worker = context.actorOf(props, name) > logInfo("Started receiver worker at:" + worker.path) > sender ! worker > case _: PossiblyHarmful => hiccups.incrementAndGet() > case _: Statistics => > val workers = context.children > sender ! Statistics(n.get, workers.size, hiccups.get, > workers.mkString("\n")) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12934) Count-min sketch serialization
[ https://issues.apache.org/jira/browse/SPARK-12934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114682#comment-15114682 ] Apache Spark commented on SPARK-12934: -- User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/10893 > Count-min sketch serialization > -- > > Key: SPARK-12934 > URL: https://issues.apache.org/jira/browse/SPARK-12934 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12934) Count-min sketch serialization
[ https://issues.apache.org/jira/browse/SPARK-12934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12934: Assignee: Apache Spark (was: Cheng Lian) > Count-min sketch serialization > -- > > Key: SPARK-12934 > URL: https://issues.apache.org/jira/browse/SPARK-12934 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12934) Count-min sketch serialization
[ https://issues.apache.org/jira/browse/SPARK-12934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12934: Assignee: Cheng Lian (was: Apache Spark) > Count-min sketch serialization > -- > > Key: SPARK-12934 > URL: https://issues.apache.org/jira/browse/SPARK-12934 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12624) When schema is specified, we should give better error message if actual row length doesn't match
[ https://issues.apache.org/jira/browse/SPARK-12624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-12624: - Assignee: Cheng Lian > When schema is specified, we should give better error message if actual row > length doesn't match > > > Key: SPARK-12624 > URL: https://issues.apache.org/jira/browse/SPARK-12624 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: Reynold Xin >Assignee: Cheng Lian >Priority: Blocker > Fix For: 1.6.1, 2.0.0 > > > The following code snippet reproduces this issue: > {code} > from pyspark.sql.types import StructType, StructField, IntegerType, StringType > from pyspark.sql.types import Row > schema = StructType([StructField("a", IntegerType()), StructField("b", > StringType())]) > rdd = sc.parallelize(range(10)).map(lambda x: Row(a=x)) > df = sqlContext.createDataFrame(rdd, schema) > df.show() > {code} > An unintuitive {{ArrayIndexOutOfBoundsException}} exception is thrown in this > case: > {code} > ... > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.genericGet(rows.scala:227) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getAs(rows.scala:35) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.isNullAt(rows.scala:36) > ... > {code} > We should give a better error message here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12624) When schema is specified, we should give better error message if actual row length doesn't match
[ https://issues.apache.org/jira/browse/SPARK-12624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-12624. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved by pull request 10886 [https://github.com/apache/spark/pull/10886] > When schema is specified, we should give better error message if actual row > length doesn't match > > > Key: SPARK-12624 > URL: https://issues.apache.org/jira/browse/SPARK-12624 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: Reynold Xin >Priority: Blocker > Fix For: 2.0.0, 1.6.1 > > > The following code snippet reproduces this issue: > {code} > from pyspark.sql.types import StructType, StructField, IntegerType, StringType > from pyspark.sql.types import Row > schema = StructType([StructField("a", IntegerType()), StructField("b", > StringType())]) > rdd = sc.parallelize(range(10)).map(lambda x: Row(a=x)) > df = sqlContext.createDataFrame(rdd, schema) > df.show() > {code} > An unintuitive {{ArrayIndexOutOfBoundsException}} exception is thrown in this > case: > {code} > ... > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.genericGet(rows.scala:227) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getAs(rows.scala:35) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.isNullAt(rows.scala:36) > ... > {code} > We should give a better error message here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12970) Error in documentation
[ https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114292#comment-15114292 ] Sean Owen commented on SPARK-12970: --- The example works, except for two minor issues: need to {{import org.apache.spark.sql.types._}} as well, and the double-bracket syntax used in the result that is printed in the last line causes that strange {{@link ...}} to appear. As far as I know the {{Row}} here is of the correct type to use with the struct schema, though it's not shown actually used here. Do you see a particular problem? > Error in documentation > --- > > Key: SPARK-12970 > URL: https://issues.apache.org/jira/browse/SPARK-12970 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Haidar Hadi >Priority: Minor > Labels: documentation > > The provided example in this doc > https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html > for creating Row from Struct is wrong > // Create a Row with the schema defined by struct > val row = Row(Row(1, 2, true)) > // row: Row = {@link 1,2,true} > > the above example does not create a Row object with schema. > this error is in the scala docs too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4878) driverPropsFetcher causes spurious Akka disassociate errors
[ https://issues.apache.org/jira/browse/SPARK-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114283#comment-15114283 ] Sean Owen commented on SPARK-4878: -- [~viirya] maybe you know better; I still see this code in {{CoarseGrainedExecutorBackend}}, but I am not clear whether it's still live and used? {code} // Bootstrap to fetch the driver's Spark properties. val executorConf = new SparkConf val port = executorConf.getInt("spark.executor.port", 0) val fetcher = RpcEnv.create( "driverPropsFetcher", hostname, port, executorConf, new SecurityManager(executorConf), clientMode = true) val driver = fetcher.setupEndpointRefByURI(driverUrl) val props = driver.askWithRetry[Seq[(String, String)]](RetrieveSparkProps) ++ Seq[(String, String)](("spark.app.id", appId)) fetcher.shutdown() {code} > driverPropsFetcher causes spurious Akka disassociate errors > --- > > Key: SPARK-4878 > URL: https://issues.apache.org/jira/browse/SPARK-4878 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Stephen Haberman >Priority: Minor > > The dedicated Akka system to fetching driver properties seems fine, but it > leads to very misleading "AssociationHandle$Disassociated", dead letter, etc. > sort of messages that can lead the user to believe something is wrong with > the cluster. > (E.g. personally I thought it was a Spark -rc1/-rc2 bug and spent awhile > poking around until I saw in the code that driverPropsFetcher is > purposefully/immediately shutdown.) > Is there any way to cleanly shutdown that initial akka system so that the > driver doesn't log these errors? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.
[ https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114794#comment-15114794 ] Hyukjin Kwon commented on SPARK-12890: -- In that case, it will not read all the data but only footer (metadata), {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would be empty because the required column is a partition column. > Spark SQL query related to only partition fields should not scan the whole > data. > > > Key: SPARK-12890 > URL: https://issues.apache.org/jira/browse/SPARK-12890 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Prakash Chockalingam > > I have a SQL query which has only partition fields. The query ends up > scanning all the data which is unnecessary. > Example: select max(date) from table, where the table is partitioned by date. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.
[ https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114794#comment-15114794 ] Hyukjin Kwon edited comment on SPARK-12890 at 1/25/16 5:49 AM: --- In that case, it will not read all the data but only footers (metadata) for each file, {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would be empty because the required column is a partition column. was (Author: hyukjin.kwon): In that case, it will not read all the data but only footer (metadata), {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would be empty because the required column is a partition column. > Spark SQL query related to only partition fields should not scan the whole > data. > > > Key: SPARK-12890 > URL: https://issues.apache.org/jira/browse/SPARK-12890 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Prakash Chockalingam > > I have a SQL query which has only partition fields. The query ends up > scanning all the data which is unnecessary. > Example: select max(date) from table, where the table is partitioned by date. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12946) The SQL page is empty
[ https://issues.apache.org/jira/browse/SPARK-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114816#comment-15114816 ] KaiXinXIaoLei commented on SPARK-12946: --- The same problem with SPARK-12492, now i close the jira. Thanks. > The SQL page is empty > - > > Key: SPARK-12946 > URL: https://issues.apache.org/jira/browse/SPARK-12946 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: KaiXinXIaoLei > Attachments: SQLpage.png > > > I run sql query using "bin/spark-sql --master yarn". Then i open the ui , > and find the SQL page is empty -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.
[ https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114829#comment-15114829 ] Simeon Simeonov commented on SPARK-12890: - Thanks for the clarification, [~hyukjin.kwon]. Still, there is no reason why it should be looking at the files at all. This is especially a problem when the Parquet files are in an object store such as S3, because there is no such thing as reading the footer of an S3 object. > Spark SQL query related to only partition fields should not scan the whole > data. > > > Key: SPARK-12890 > URL: https://issues.apache.org/jira/browse/SPARK-12890 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Prakash Chockalingam > > I have a SQL query which has only partition fields. The query ends up > scanning all the data which is unnecessary. > Example: select max(date) from table, where the table is partitioned by date. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12941) Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype
[ https://issues.apache.org/jira/browse/SPARK-12941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114860#comment-15114860 ] Thomas Sebastian commented on SPARK-12941: -- Sure. Working on it. > Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR > datatype > -- > > Key: SPARK-12941 > URL: https://issues.apache.org/jira/browse/SPARK-12941 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.1 > Environment: Apache Spark 1.4.2.2 >Reporter: Jose Martinez Poblete > > When exporting data from Spark to Oracle, string datatypes are translated to > TEXT for Oracle, this is leading to the following error > {noformat} > java.sql.SQLSyntaxErrorException: ORA-00902: invalid datatype > {noformat} > As per the following code: > https://github.com/apache/spark/blob/branch-1.4/sql/core/src/main/scala/org/apache/spark/sql/jdbc/jdbc.scala#L144 > See also: > http://stackoverflow.com/questions/31287182/writing-to-oracle-database-using-apache-spark-1-4-0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4878) driverPropsFetcher causes spurious Akka disassociate errors
[ https://issues.apache.org/jira/browse/SPARK-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114861#comment-15114861 ] Liang-Chi Hsieh commented on SPARK-4878: I think it is still alive and used. The above code sends the message {code}RetrieveSparkProps{code} to {code}CoarseGrainedSchedulerBackend.DriverEndpoint{code} and receives spark properties back. > driverPropsFetcher causes spurious Akka disassociate errors > --- > > Key: SPARK-4878 > URL: https://issues.apache.org/jira/browse/SPARK-4878 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Stephen Haberman >Priority: Minor > > The dedicated Akka system to fetching driver properties seems fine, but it > leads to very misleading "AssociationHandle$Disassociated", dead letter, etc. > sort of messages that can lead the user to believe something is wrong with > the cluster. > (E.g. personally I thought it was a Spark -rc1/-rc2 bug and spent awhile > poking around until I saw in the code that driverPropsFetcher is > purposefully/immediately shutdown.) > Is there any way to cleanly shutdown that initial akka system so that the > driver doesn't log these errors? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4878) driverPropsFetcher causes spurious Akka disassociate errors
[ https://issues.apache.org/jira/browse/SPARK-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114861#comment-15114861 ] Liang-Chi Hsieh edited comment on SPARK-4878 at 1/25/16 7:19 AM: - I think it is still alive and used. The above code sends the message {{RetrieveSparkProps}} to {{CoarseGrainedSchedulerBackend.DriverEndpoint}} and receives spark properties back. But of course it doesn't use Akka anymore. was (Author: viirya): I think it is still alive and used. The above code sends the message {{RetrieveSparkProps}} to {{CoarseGrainedSchedulerBackend.DriverEndpoint}} and receives spark properties back. > driverPropsFetcher causes spurious Akka disassociate errors > --- > > Key: SPARK-4878 > URL: https://issues.apache.org/jira/browse/SPARK-4878 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Stephen Haberman >Priority: Minor > > The dedicated Akka system to fetching driver properties seems fine, but it > leads to very misleading "AssociationHandle$Disassociated", dead letter, etc. > sort of messages that can lead the user to believe something is wrong with > the cluster. > (E.g. personally I thought it was a Spark -rc1/-rc2 bug and spent awhile > poking around until I saw in the code that driverPropsFetcher is > purposefully/immediately shutdown.) > Is there any way to cleanly shutdown that initial akka system so that the > driver doesn't log these errors? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.
[ https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114794#comment-15114794 ] Hyukjin Kwon edited comment on SPARK-12890 at 1/25/16 5:53 AM: --- In that case, it will not read all the data but only footers (metadata) for each file, {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would be empty because the required column is a partition column. was (Author: hyukjin.kwon): In that case, it will not read all the data but only footers (metadata) for each file, {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would be empty because the required column is a partition column. Oh, if you meant not filtering row groups, yes it will read all the row groups > Spark SQL query related to only partition fields should not scan the whole > data. > > > Key: SPARK-12890 > URL: https://issues.apache.org/jira/browse/SPARK-12890 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Prakash Chockalingam > > I have a SQL query which has only partition fields. The query ends up > scanning all the data which is unnecessary. > Example: select max(date) from table, where the table is partitioned by date. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.
[ https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114794#comment-15114794 ] Hyukjin Kwon edited comment on SPARK-12890 at 1/25/16 5:52 AM: --- In that case, it will not read all the data but only footers (metadata) for each file, {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would be empty because the required column is a partition column. Oh, if you meant not filtering row groups, yes it will read all the row groups was (Author: hyukjin.kwon): In that case, it will not read all the data but only footers (metadata) for each file, {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would be empty because the required column is a partition column. > Spark SQL query related to only partition fields should not scan the whole > data. > > > Key: SPARK-12890 > URL: https://issues.apache.org/jira/browse/SPARK-12890 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Prakash Chockalingam > > I have a SQL query which has only partition fields. The query ends up > scanning all the data which is unnecessary. > Example: select max(date) from table, where the table is partitioned by date. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12946) The SQL page is empty
[ https://issues.apache.org/jira/browse/SPARK-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114838#comment-15114838 ] Josh Rosen commented on SPARK-12946: Ah, thanks for providing these extra details; this is very helpful. My hunch is that DDL operations like CREATE TABLE aren't triggering the execution of Spark jobs, explaining why you don't see any queries on the SQL page. My hunch is that you'd see some output there if you ran an actual query against that table after creating it, though. > The SQL page is empty > - > > Key: SPARK-12946 > URL: https://issues.apache.org/jira/browse/SPARK-12946 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: KaiXinXIaoLei > Attachments: SQLpage.png > > > I run sql query using "bin/spark-sql --master yarn". Then i open the ui , > and find the SQL page is empty -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12973) Support to set priority when submit spark application to YARN
[ https://issues.apache.org/jira/browse/SPARK-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114807#comment-15114807 ] Saisai Shao commented on SPARK-12973: - I think there's a similar JIRA SPARK-10879 about this issue, and there's a closed PR about it. > Support to set priority when submit spark application to YARN > - > > Key: SPARK-12973 > URL: https://issues.apache.org/jira/browse/SPARK-12973 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.6.1 >Reporter: Chaozhong Yang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4878) driverPropsFetcher causes spurious Akka disassociate errors
[ https://issues.apache.org/jira/browse/SPARK-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114861#comment-15114861 ] Liang-Chi Hsieh edited comment on SPARK-4878 at 1/25/16 7:17 AM: - I think it is still alive and used. The above code sends the message {{RetrieveSparkProps}} to {{CoarseGrainedSchedulerBackend.DriverEndpoint}} and receives spark properties back. was (Author: viirya): I think it is still alive and used. The above code sends the message {code}RetrieveSparkProps{code} to {code}CoarseGrainedSchedulerBackend.DriverEndpoint{code} and receives spark properties back. > driverPropsFetcher causes spurious Akka disassociate errors > --- > > Key: SPARK-4878 > URL: https://issues.apache.org/jira/browse/SPARK-4878 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Stephen Haberman >Priority: Minor > > The dedicated Akka system to fetching driver properties seems fine, but it > leads to very misleading "AssociationHandle$Disassociated", dead letter, etc. > sort of messages that can lead the user to believe something is wrong with > the cluster. > (E.g. personally I thought it was a Spark -rc1/-rc2 bug and spent awhile > poking around until I saw in the code that driverPropsFetcher is > purposefully/immediately shutdown.) > Is there any way to cleanly shutdown that initial akka system so that the > driver doesn't log these errors? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12977) Factoring out StreamingListener and UI to support history UI
Saisai Shao created SPARK-12977: --- Summary: Factoring out StreamingListener and UI to support history UI Key: SPARK-12977 URL: https://issues.apache.org/jira/browse/SPARK-12977 Project: Spark Issue Type: Sub-task Components: Streaming Affects Versions: 1.6.0 Reporter: Saisai Shao -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12783) Dataset map serialization error
[ https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114340#comment-15114340 ] Muthu Jayakumar commented on SPARK-12783: - I moved it to another file altogether (as attached). I have another file that has the main thread like shown below.. {code} object SparkJira extends App{ val sc = //get sc. private val sqlContext: SQLContext = sc._2.sqlContext import sqlContext.implicits._ val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), TestCaseClass("2015-05-01", "data2"))).toDF() df1.as[TestCaseClass].map(_.toStr).show() //works fine df1.as[TestCaseClass].map(_.toMyMap).show() //error } {code} > Dataset map serialization error > --- > > Key: SPARK-12783 > URL: https://issues.apache.org/jira/browse/SPARK-12783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Muthu Jayakumar >Assignee: Wenchen Fan >Priority: Critical > Attachments: MyMap.scala > > > When Dataset API is used to map to another case class, an error is thrown. > {code} > case class MyMap(map: Map[String, String]) > case class TestCaseClass(a: String, b: String){ > def toMyMap: MyMap = { > MyMap(Map(a->b)) > } > def toStr: String = { > a > } > } > //Main method section below > import sqlContext.implicits._ > val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), > TestCaseClass("2015-05-01", "data2"))).toDF() > df1.as[TestCaseClass].map(_.toStr).show() //works fine > df1.as[TestCaseClass].map(_.toMyMap).show() //fails > {code} > Error message: > {quote} > Caused by: java.io.NotSerializableException: > scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1 > Serialization stack: > - object not serializable (class: > scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1, value: > package lang) > - field (class: scala.reflect.internal.Types$ThisType, name: sym, type: > class scala.reflect.internal.Symbols$Symbol) > - object (class scala.reflect.internal.Types$UniqueThisType, > java.lang.type) > - field (class: scala.reflect.internal.Types$TypeRef, name: pre, type: > class scala.reflect.internal.Types$Type) > - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef, String) > - field (class: scala.reflect.internal.Types$TypeRef, name: normalized, > type: class scala.reflect.internal.Types$Type) > - object (class scala.reflect.internal.Types$AliasNoArgsTypeRef, String) > - field (class: > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, name: keyType$1, > type: class scala.reflect.api.Types$TypeApi) > - object (class > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, ) > - field (class: org.apache.spark.sql.catalyst.expressions.MapObjects, > name: function, type: interface scala.Function1) > - object (class org.apache.spark.sql.catalyst.expressions.MapObjects, > mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType)) > - field (class: org.apache.spark.sql.catalyst.expressions.Invoke, name: > targetObject, type: class > org.apache.spark.sql.catalyst.expressions.Expression) > - object (class org.apache.spark.sql.catalyst.expressions.Invoke, > invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class > [Ljava.lang.Object;))) > - writeObject data (class: > scala.collection.immutable.List$SerializationProxy) > - object (class scala.collection.immutable.List$SerializationProxy, > scala.collection.immutable.List$SerializationProxy@4c7e3aab) > - writeReplace data (class: > scala.collection.immutable.List$SerializationProxy) > - object (class scala.collection.immutable.$colon$colon, > List(invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class > [Ljava.lang.Object;)), > invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),valueArray,ArrayType(StringType,true)),StringType),array,ObjectType(class > [Ljava.lang.Object; > - field (class: org.apache.spark.sql.catalyst.expressions.StaticInvoke, > name: arguments, type: interface scala.collection.Seq) >
[jira] [Comment Edited] (SPARK-12783) Dataset map serialization error
[ https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114340#comment-15114340 ] Muthu Jayakumar edited comment on SPARK-12783 at 1/24/16 3:12 PM: -- I moved it to another file altogether (as attached). I have another file that has the main thread like shown below.. {code} object SparkJira extends App{ val sc = //get sc. private val sqlContext: SQLContext = sc._2.sqlContext import sqlContext.implicits._ val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), TestCaseClass("2015-05-01", "data2"))).toDF() df1.as[TestCaseClass].map(_.toStr).show() //works fine df1.as[TestCaseClass].map(_.toMyMap).show() //error } {code} I am using 1.6 release version for testing. Would want me to try with some other version? was (Author: babloo80): I moved it to another file altogether (as attached). I have another file that has the main thread like shown below.. {code} object SparkJira extends App{ val sc = //get sc. private val sqlContext: SQLContext = sc._2.sqlContext import sqlContext.implicits._ val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), TestCaseClass("2015-05-01", "data2"))).toDF() df1.as[TestCaseClass].map(_.toStr).show() //works fine df1.as[TestCaseClass].map(_.toMyMap).show() //error } {code} > Dataset map serialization error > --- > > Key: SPARK-12783 > URL: https://issues.apache.org/jira/browse/SPARK-12783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Muthu Jayakumar >Assignee: Wenchen Fan >Priority: Critical > Attachments: MyMap.scala > > > When Dataset API is used to map to another case class, an error is thrown. > {code} > case class MyMap(map: Map[String, String]) > case class TestCaseClass(a: String, b: String){ > def toMyMap: MyMap = { > MyMap(Map(a->b)) > } > def toStr: String = { > a > } > } > //Main method section below > import sqlContext.implicits._ > val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), > TestCaseClass("2015-05-01", "data2"))).toDF() > df1.as[TestCaseClass].map(_.toStr).show() //works fine > df1.as[TestCaseClass].map(_.toMyMap).show() //fails > {code} > Error message: > {quote} > Caused by: java.io.NotSerializableException: > scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1 > Serialization stack: > - object not serializable (class: > scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1, value: > package lang) > - field (class: scala.reflect.internal.Types$ThisType, name: sym, type: > class scala.reflect.internal.Symbols$Symbol) > - object (class scala.reflect.internal.Types$UniqueThisType, > java.lang.type) > - field (class: scala.reflect.internal.Types$TypeRef, name: pre, type: > class scala.reflect.internal.Types$Type) > - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef, String) > - field (class: scala.reflect.internal.Types$TypeRef, name: normalized, > type: class scala.reflect.internal.Types$Type) > - object (class scala.reflect.internal.Types$AliasNoArgsTypeRef, String) > - field (class: > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, name: keyType$1, > type: class scala.reflect.api.Types$TypeApi) > - object (class > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, ) > - field (class: org.apache.spark.sql.catalyst.expressions.MapObjects, > name: function, type: interface scala.Function1) > - object (class org.apache.spark.sql.catalyst.expressions.MapObjects, > mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType)) > - field (class: org.apache.spark.sql.catalyst.expressions.Invoke, name: > targetObject, type: class > org.apache.spark.sql.catalyst.expressions.Expression) > - object (class org.apache.spark.sql.catalyst.expressions.Invoke, > invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class > [Ljava.lang.Object;))) > - writeObject data (class: > scala.collection.immutable.List$SerializationProxy) > - object (class scala.collection.immutable.List$SerializationProxy, > scala.collection.immutable.List$SerializationProxy@4c7e3aab) > - writeReplace data (class: > scala.collection.immutable.List$SerializationProxy) > - object (class scala.collection.immutable.$colon$colon, >
[jira] [Commented] (SPARK-11796) Docker JDBC integration tests fail in Maven build due to dependency issue
[ https://issues.apache.org/jira/browse/SPARK-11796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114383#comment-15114383 ] Mark Grover commented on SPARK-11796: - [~blbradley] I put instructions [here|https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-RunningDockerintegrationtests] on how to make tests pass. > Docker JDBC integration tests fail in Maven build due to dependency issue > - > > Key: SPARK-11796 > URL: https://issues.apache.org/jira/browse/SPARK-11796 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.6.0 >Reporter: Josh Rosen >Assignee: Mark Grover > Fix For: 1.6.0 > > > Our new Docker integration tests for JDBC dialects are failing in the Maven > builds. For now, I've disabled this for Maven by adding the > {{-Dtest.exclude.tags=org.apache.spark.tags.DockerTest}} flag to our Jenkins > builds, but we should fix this soon. The test failures seem to be related to > dependency or classpath issues: > {code} > *** RUN ABORTED *** > java.lang.NoSuchMethodError: > org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder; > at > org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240) > at > org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115) > at > org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418) > at > org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117) > at > org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340) > at > org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726) > at > org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285) > at > org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126) > ... > {code} > To reproduce locally: {{build/mvn -pl docker-integration-tests package}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12783) Dataset map serialization error
[ https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muthu Jayakumar updated SPARK-12783: Attachment: MyMap.scala > Dataset map serialization error > --- > > Key: SPARK-12783 > URL: https://issues.apache.org/jira/browse/SPARK-12783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Muthu Jayakumar >Assignee: Wenchen Fan >Priority: Critical > Attachments: MyMap.scala > > > When Dataset API is used to map to another case class, an error is thrown. > {code} > case class MyMap(map: Map[String, String]) > case class TestCaseClass(a: String, b: String){ > def toMyMap: MyMap = { > MyMap(Map(a->b)) > } > def toStr: String = { > a > } > } > //Main method section below > import sqlContext.implicits._ > val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), > TestCaseClass("2015-05-01", "data2"))).toDF() > df1.as[TestCaseClass].map(_.toStr).show() //works fine > df1.as[TestCaseClass].map(_.toMyMap).show() //fails > {code} > Error message: > {quote} > Caused by: java.io.NotSerializableException: > scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1 > Serialization stack: > - object not serializable (class: > scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1, value: > package lang) > - field (class: scala.reflect.internal.Types$ThisType, name: sym, type: > class scala.reflect.internal.Symbols$Symbol) > - object (class scala.reflect.internal.Types$UniqueThisType, > java.lang.type) > - field (class: scala.reflect.internal.Types$TypeRef, name: pre, type: > class scala.reflect.internal.Types$Type) > - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef, String) > - field (class: scala.reflect.internal.Types$TypeRef, name: normalized, > type: class scala.reflect.internal.Types$Type) > - object (class scala.reflect.internal.Types$AliasNoArgsTypeRef, String) > - field (class: > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, name: keyType$1, > type: class scala.reflect.api.Types$TypeApi) > - object (class > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, ) > - field (class: org.apache.spark.sql.catalyst.expressions.MapObjects, > name: function, type: interface scala.Function1) > - object (class org.apache.spark.sql.catalyst.expressions.MapObjects, > mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType)) > - field (class: org.apache.spark.sql.catalyst.expressions.Invoke, name: > targetObject, type: class > org.apache.spark.sql.catalyst.expressions.Expression) > - object (class org.apache.spark.sql.catalyst.expressions.Invoke, > invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class > [Ljava.lang.Object;))) > - writeObject data (class: > scala.collection.immutable.List$SerializationProxy) > - object (class scala.collection.immutable.List$SerializationProxy, > scala.collection.immutable.List$SerializationProxy@4c7e3aab) > - writeReplace data (class: > scala.collection.immutable.List$SerializationProxy) > - object (class scala.collection.immutable.$colon$colon, > List(invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class > [Ljava.lang.Object;)), > invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),valueArray,ArrayType(StringType,true)),StringType),array,ObjectType(class > [Ljava.lang.Object; > - field (class: org.apache.spark.sql.catalyst.expressions.StaticInvoke, > name: arguments, type: interface scala.collection.Seq) > - object (class org.apache.spark.sql.catalyst.expressions.StaticInvoke, > staticinvoke(class > org.apache.spark.sql.catalyst.util.ArrayBasedMapData$,ObjectType(interface > scala.collection.Map),toScalaMap,invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class > >
[jira] [Commented] (SPARK-11796) Docker JDBC integration tests fail in Maven build due to dependency issue
[ https://issues.apache.org/jira/browse/SPARK-11796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114384#comment-15114384 ] Brandon Bradley commented on SPARK-11796: - [~mgrover] The test runs on the command line but not in IntelliJ. Looks like a classpath dependency issue, having trouble sorting it out. > Docker JDBC integration tests fail in Maven build due to dependency issue > - > > Key: SPARK-11796 > URL: https://issues.apache.org/jira/browse/SPARK-11796 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.6.0 >Reporter: Josh Rosen >Assignee: Mark Grover > Fix For: 1.6.0 > > > Our new Docker integration tests for JDBC dialects are failing in the Maven > builds. For now, I've disabled this for Maven by adding the > {{-Dtest.exclude.tags=org.apache.spark.tags.DockerTest}} flag to our Jenkins > builds, but we should fix this soon. The test failures seem to be related to > dependency or classpath issues: > {code} > *** RUN ABORTED *** > java.lang.NoSuchMethodError: > org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder; > at > org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240) > at > org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115) > at > org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418) > at > org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117) > at > org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340) > at > org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726) > at > org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285) > at > org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126) > ... > {code} > To reproduce locally: {{build/mvn -pl docker-integration-tests package}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12850) Support bucket pruning (predicate pushdown for bucketed tables)
[ https://issues.apache.org/jira/browse/SPARK-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114420#comment-15114420 ] Xiao Li commented on SPARK-12850: - To make the implementation of this JIRA simpler, I will first submit a separate PR for handling the table bucketing when partitioning columns have overlapping columns with the bucketing columns. > Support bucket pruning (predicate pushdown for bucketed tables) > --- > > Key: SPARK-12850 > URL: https://issues.apache.org/jira/browse/SPARK-12850 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > We now support bucketing. One optimization opportunity is to push some > predicates into the scan to skip scanning files that definitely won't match > the values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12970) Error in documentation
[ https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114485#comment-15114485 ] Haidar Hadi commented on SPARK-12970: - let's consider the following code: import org.apache.spark.sql.types._ val struct = StructType(StructField("f1", StringType, true) :: Nil) val row = Row(1) println(row.fieldIndex("f1")) which generates the following error when executed: Exception in thread "main" java.lang.UnsupportedOperationException: fieldIndex on a Row without schema is undefined. > Error in documentation > --- > > Key: SPARK-12970 > URL: https://issues.apache.org/jira/browse/SPARK-12970 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Haidar Hadi >Priority: Minor > Labels: documentation > > The provided example in this doc > https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html > for creating Row from Struct is wrong > // Create a Row with the schema defined by struct > val row = Row(Row(1, 2, true)) > // row: Row = {@link 1,2,true} > > the above example does not create a Row object with schema. > this error is in the scala docs too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11796) Docker JDBC integration tests fail in Maven build due to dependency issue
[ https://issues.apache.org/jira/browse/SPARK-11796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114484#comment-15114484 ] Brandon Bradley commented on SPARK-11796: - [~mgrover] I figured it out! I believe IntelliJ doesn't support shaded dependencies. > Docker JDBC integration tests fail in Maven build due to dependency issue > - > > Key: SPARK-11796 > URL: https://issues.apache.org/jira/browse/SPARK-11796 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.6.0 >Reporter: Josh Rosen >Assignee: Mark Grover > Fix For: 1.6.0 > > > Our new Docker integration tests for JDBC dialects are failing in the Maven > builds. For now, I've disabled this for Maven by adding the > {{-Dtest.exclude.tags=org.apache.spark.tags.DockerTest}} flag to our Jenkins > builds, but we should fix this soon. The test failures seem to be related to > dependency or classpath issues: > {code} > *** RUN ABORTED *** > java.lang.NoSuchMethodError: > org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder; > at > org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240) > at > org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115) > at > org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418) > at > org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117) > at > org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340) > at > org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726) > at > org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285) > at > org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126) > ... > {code} > To reproduce locally: {{build/mvn -pl docker-integration-tests package}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10498) Add requirements file for create dev python tools
[ https://issues.apache.org/jira/browse/SPARK-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-10498. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10871 [https://github.com/apache/spark/pull/10871] > Add requirements file for create dev python tools > - > > Key: SPARK-10498 > URL: https://issues.apache.org/jira/browse/SPARK-10498 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: holdenk >Priority: Minor > Fix For: 2.0.0 > > > Minor since so few people use them, but it would probably be good to have a > requirements file for our python release tools for easier setup (also version > pinning). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12970) Error in documentation on creating rows with schemas defined by structs
[ https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-12970: --- Summary: Error in documentation on creating rows with schemas defined by structs (was: Error in documentation ) > Error in documentation on creating rows with schemas defined by structs > --- > > Key: SPARK-12970 > URL: https://issues.apache.org/jira/browse/SPARK-12970 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Haidar Hadi >Priority: Minor > Labels: documentation > > The provided example in this doc > https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html > for creating Row from Struct is wrong > // Create a Row with the schema defined by struct > val row = Row(Row(1, 2, true)) > // row: Row = {@link 1,2,true} > > the above example does not create a Row object with schema. > this error is in the scala docs too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12971) Address test isolation problems which broke Hive tests on Hadoop 2.3 SBT build
[ https://issues.apache.org/jira/browse/SPARK-12971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-12971. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10884 [https://github.com/apache/spark/pull/10884] > Address test isolation problems which broke Hive tests on Hadoop 2.3 SBT build > -- > > Key: SPARK-12971 > URL: https://issues.apache.org/jira/browse/SPARK-12971 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 2.0.0 > > > ErrorPositionSuite and one of the HiveComparisonTest tests have been > consistently failing on the Hadoop 2.3 SBT build (but on no other builds). I > believe that this is due to test isolation issues (e.g. tests sharing state > via the sets of temporary tables that are registered to TestHive). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-11796) Docker JDBC integration tests fail in Maven build due to dependency issue
[ https://issues.apache.org/jira/browse/SPARK-11796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114484#comment-15114484 ] Brandon Bradley edited comment on SPARK-11796 at 1/24/16 7:38 PM: -- [~mgrover] I figured it out! I believe IntelliJ 15 doesn't support shaded dependencies from SBT. It imports jars from the shaded dependencies as well. was (Author: blbradley): [~mgrover] I figured it out! I believe IntelliJ doesn't support shaded dependencies. > Docker JDBC integration tests fail in Maven build due to dependency issue > - > > Key: SPARK-11796 > URL: https://issues.apache.org/jira/browse/SPARK-11796 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.6.0 >Reporter: Josh Rosen >Assignee: Mark Grover > Fix For: 1.6.0 > > > Our new Docker integration tests for JDBC dialects are failing in the Maven > builds. For now, I've disabled this for Maven by adding the > {{-Dtest.exclude.tags=org.apache.spark.tags.DockerTest}} flag to our Jenkins > builds, but we should fix this soon. The test failures seem to be related to > dependency or classpath issues: > {code} > *** RUN ABORTED *** > java.lang.NoSuchMethodError: > org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder; > at > org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240) > at > org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115) > at > org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418) > at > org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117) > at > org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340) > at > org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726) > at > org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285) > at > org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126) > ... > {code} > To reproduce locally: {{build/mvn -pl docker-integration-tests package}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12970) Error in documentation
[ https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114500#comment-15114500 ] Josh Rosen commented on SPARK-12970: Also, general note regarding JIRA titles: "error in documentation" is a really bad title since that could mean anything. Next time, please choose a more descriptive-yet-concise title, since that makes issues easier to search and scan and helps the emails to have better subject lines in our inboxes. > Error in documentation > --- > > Key: SPARK-12970 > URL: https://issues.apache.org/jira/browse/SPARK-12970 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Haidar Hadi >Priority: Minor > Labels: documentation > > The provided example in this doc > https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html > for creating Row from Struct is wrong > // Create a Row with the schema defined by struct > val row = Row(Row(1, 2, true)) > // row: Row = {@link 1,2,true} > > the above example does not create a Row object with schema. > this error is in the scala docs too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12850) Support bucket pruning (predicate pushdown for bucketed tables)
[ https://issues.apache.org/jira/browse/SPARK-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114503#comment-15114503 ] Reynold Xin commented on SPARK-12850: - I'd just do the simply cases first. > Support bucket pruning (predicate pushdown for bucketed tables) > --- > > Key: SPARK-12850 > URL: https://issues.apache.org/jira/browse/SPARK-12850 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > We now support bucketing. One optimization opportunity is to push some > predicates into the scan to skip scanning files that definitely won't match > the values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12850) Support bucket pruning (predicate pushdown for bucketed tables)
[ https://issues.apache.org/jira/browse/SPARK-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114415#comment-15114415 ] Xiao Li commented on SPARK-12850: - Doing the design and prototype. Notice a few issues we need to consider: - Partitioning columns could have overlapping columns with the bucketing columns; - The predicates we can use for bucket pruning: EqualTo, EqualNullSafe, IsNull, In, InSet; - Need to support mixed And and Or in the filters; - After generating the bucket set we need to scan, we should remove the corresponding filters, if possible; Maybe, I will just submit a simplified version at first. > Support bucket pruning (predicate pushdown for bucketed tables) > --- > > Key: SPARK-12850 > URL: https://issues.apache.org/jira/browse/SPARK-12850 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > We now support bucketing. One optimization opportunity is to push some > predicates into the scan to skip scanning files that definitely won't match > the values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10498) Add requirements file for create dev python tools
[ https://issues.apache.org/jira/browse/SPARK-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10498: --- Assignee: holdenk > Add requirements file for create dev python tools > - > > Key: SPARK-10498 > URL: https://issues.apache.org/jira/browse/SPARK-10498 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: holdenk >Assignee: holdenk >Priority: Minor > Fix For: 2.0.0 > > > Minor since so few people use them, but it would probably be good to have a > requirements file for our python release tools for easier setup (also version > pinning). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12850) Support bucket pruning (predicate pushdown for bucketed tables)
[ https://issues.apache.org/jira/browse/SPARK-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114515#comment-15114515 ] Xiao Li commented on SPARK-12850: - Sure, will make a try. Thanks! > Support bucket pruning (predicate pushdown for bucketed tables) > --- > > Key: SPARK-12850 > URL: https://issues.apache.org/jira/browse/SPARK-12850 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > We now support bucketing. One optimization opportunity is to push some > predicates into the scan to skip scanning files that definitely won't match > the values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12120) Improve exception message when failing to initialize HiveContext in PySpark
[ https://issues.apache.org/jira/browse/SPARK-12120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-12120: --- Assignee: Jeff Zhang > Improve exception message when failing to initialize HiveContext in PySpark > --- > > Key: SPARK-12120 > URL: https://issues.apache.org/jira/browse/SPARK-12120 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang >Assignee: Jeff Zhang >Priority: Minor > > I get the following exception message when failing to initialize HiveContext. > This is hard to figure out why HiveContext failed to initialize. Actually I > build spark with hive profile enabled. The reason the HiveContext failed is > due to I didn't start hdfs service. And actually I can see the full > stacktrace in spark-shell. And I also can see the full stack trace in > python2. The issue only exists in python2.x > {code} > Traceback (most recent call last): > File "", line 1, in > File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 430, > in createDataFrame > jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json()) > File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 691, > in _ssql_ctx > "build/sbt assembly", e) > Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run > build/sbt assembly", Py4JJavaError(u'An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o34)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12970) Error in documentation on creating rows with schemas defined by structs
[ https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114539#comment-15114539 ] Haidar Hadi commented on SPARK-12970: - sure [~jrose] I understand. > Error in documentation on creating rows with schemas defined by structs > --- > > Key: SPARK-12970 > URL: https://issues.apache.org/jira/browse/SPARK-12970 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Haidar Hadi >Priority: Minor > Labels: documentation > > The provided example in this doc > https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html > for creating Row from Struct is wrong > // Create a Row with the schema defined by struct > val row = Row(Row(1, 2, true)) > // row: Row = {@link 1,2,true} > > the above example does not create a Row object with schema. > this error is in the scala docs too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12917) Add DML support to Spark SQL for HIVE
[ https://issues.apache.org/jira/browse/SPARK-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114513#comment-15114513 ] Herman van Hovell commented on SPARK-12917: --- Could you be a bit more specific? What dml operations are you missing? > Add DML support to Spark SQL for HIVE > - > > Key: SPARK-12917 > URL: https://issues.apache.org/jira/browse/SPARK-12917 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.6.0 >Reporter: Hemang Nagar >Priority: Blocker > > Spark SQL should be updated to support the DML operations that are being > supported by Hive since 0.14 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12120) Improve exception message when failing to initialize HiveContext in PySpark
[ https://issues.apache.org/jira/browse/SPARK-12120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114523#comment-15114523 ] Josh Rosen commented on SPARK-12120: Fixed in 1.6.1 and 2.0.0. > Improve exception message when failing to initialize HiveContext in PySpark > --- > > Key: SPARK-12120 > URL: https://issues.apache.org/jira/browse/SPARK-12120 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang >Assignee: Jeff Zhang >Priority: Minor > Fix For: 1.6.1, 2.0.0 > > > I get the following exception message when failing to initialize HiveContext. > This is hard to figure out why HiveContext failed to initialize. Actually I > build spark with hive profile enabled. The reason the HiveContext failed is > due to I didn't start hdfs service. And actually I can see the full > stacktrace in spark-shell. And I also can see the full stack trace in > python2. The issue only exists in python2.x > {code} > Traceback (most recent call last): > File "", line 1, in > File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 430, > in createDataFrame > jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json()) > File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 691, > in _ssql_ctx > "build/sbt assembly", e) > Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run > build/sbt assembly", Py4JJavaError(u'An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o34)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12120) Improve exception message when failing to initialize HiveContext in PySpark
[ https://issues.apache.org/jira/browse/SPARK-12120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-12120. Resolution: Fixed Fix Version/s: 2.0.0 1.6.1 > Improve exception message when failing to initialize HiveContext in PySpark > --- > > Key: SPARK-12120 > URL: https://issues.apache.org/jira/browse/SPARK-12120 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang >Assignee: Jeff Zhang >Priority: Minor > Fix For: 1.6.1, 2.0.0 > > > I get the following exception message when failing to initialize HiveContext. > This is hard to figure out why HiveContext failed to initialize. Actually I > build spark with hive profile enabled. The reason the HiveContext failed is > due to I didn't start hdfs service. And actually I can see the full > stacktrace in spark-shell. And I also can see the full stack trace in > python2. The issue only exists in python2.x > {code} > Traceback (most recent call last): > File "", line 1, in > File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 430, > in createDataFrame > jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json()) > File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 691, > in _ssql_ctx > "build/sbt assembly", e) > Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run > build/sbt assembly", Py4JJavaError(u'An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o34)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10345) Flaky test: HiveCompatibilitySuite.nonblock_op_deduplicate
[ https://issues.apache.org/jira/browse/SPARK-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-10345. Resolution: Cannot Reproduce > Flaky test: HiveCompatibilitySuite.nonblock_op_deduplicate > -- > > Key: SPARK-10345 > URL: https://issues.apache.org/jira/browse/SPARK-10345 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41759/testReport/org.apache.spark.sql.hive.execution/HiveCompatibilitySuite/nonblock_op_deduplicate/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12970) Error in documentation on creating rows with schemas defined by structs
[ https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114485#comment-15114485 ] Haidar Hadi edited comment on SPARK-12970 at 1/24/16 9:09 PM: -- [~srowen] let's consider the following code: import org.apache.spark.sql.types._ val struct = StructType(StructField("f1", StringType, true) :: Nil) val row = Row(1) println(row.fieldIndex("f1")) which generates the following error when executed: Exception in thread "main" java.lang.UnsupportedOperationException: fieldIndex on a Row without schema is undefined. Therefore, I do not think it is taking the struct schema as param in constructing the Row object. was (Author: hhadi): let's consider the following code: import org.apache.spark.sql.types._ val struct = StructType(StructField("f1", StringType, true) :: Nil) val row = Row(1) println(row.fieldIndex("f1")) which generates the following error when executed: Exception in thread "main" java.lang.UnsupportedOperationException: fieldIndex on a Row without schema is undefined. > Error in documentation on creating rows with schemas defined by structs > --- > > Key: SPARK-12970 > URL: https://issues.apache.org/jira/browse/SPARK-12970 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Haidar Hadi >Priority: Minor > Labels: documentation > > The provided example in this doc > https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html > for creating Row from Struct is wrong > // Create a Row with the schema defined by struct > val row = Row(Row(1, 2, true)) > // row: Row = {@link 1,2,true} > > the above example does not create a Row object with schema. > this error is in the scala docs too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-12858) Remove duplicated code in metrics
[ https://issues.apache.org/jira/browse/SPARK-12858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Fradet closed SPARK-12858. --- Resolution: Not A Problem > Remove duplicated code in metrics > - > > Key: SPARK-12858 > URL: https://issues.apache.org/jira/browse/SPARK-12858 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Benjamin Fradet >Priority: Minor > > I noticed there is some duplicated code in the sinks regarding the poll > period. > Also, parts of the metrics.properties template are unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11796) Docker JDBC integration tests fail in Maven build due to dependency issue
[ https://issues.apache.org/jira/browse/SPARK-11796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114590#comment-15114590 ] Mark Grover commented on SPARK-11796: - Awesome, thanks for sharing. > Docker JDBC integration tests fail in Maven build due to dependency issue > - > > Key: SPARK-11796 > URL: https://issues.apache.org/jira/browse/SPARK-11796 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.6.0 >Reporter: Josh Rosen >Assignee: Mark Grover > Fix For: 1.6.0 > > > Our new Docker integration tests for JDBC dialects are failing in the Maven > builds. For now, I've disabled this for Maven by adding the > {{-Dtest.exclude.tags=org.apache.spark.tags.DockerTest}} flag to our Jenkins > builds, but we should fix this soon. The test failures seem to be related to > dependency or classpath issues: > {code} > *** RUN ABORTED *** > java.lang.NoSuchMethodError: > org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder; > at > org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240) > at > org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115) > at > org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418) > at > org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117) > at > org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340) > at > org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726) > at > org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285) > at > org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126) > ... > {code} > To reproduce locally: {{build/mvn -pl docker-integration-tests package}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org