[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based
[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938249#comment-14938249 ] Reynold Xin commented on SPARK-10474: - Ah ic. It is possible that the # core estimation is completely off in Mesos fine grained mode. [~hbogert] can you print the page size and num cores in this function to check their value? https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174 > TungstenAggregation cannot acquire memory for pointer array after switching > to sort-based > - > > Key: SPARK-10474 > URL: https://issues.apache.org/jira/browse/SPARK-10474 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yi Zhou >Assignee: Andrew Or >Priority: Blocker > Fix For: 1.5.1, 1.6.0 > > > In aggregation case, a Lost task happened with below error. > {code} > java.io.IOException: Could not acquire 65536 bytes of memory > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220) > at > org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:126) > at > org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) > at > org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > Key SQL Query > {code:sql} > INSERT INTO TABLE test_table > SELECT > ss.ss_customer_sk AS cid, > count(CASE WHEN i.i_class_id=1 THEN 1 ELSE NULL END) AS id1, > count(CASE WHEN i.i_class_id=3 THEN 1 ELSE NULL END) AS id3, > count(CASE WHEN i.i_class_id=5 THEN 1 ELSE NULL END) AS id5, > count(CASE WHEN i.i_class_id=7 THEN 1 ELSE NULL END) AS id7, > count(CASE WHEN i.i_class_id=9 THEN 1 ELSE NULL END) AS id9, > count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11, > count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13, > count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15, > count(CASE WHEN i.i_class_id=2 THEN 1 ELSE NULL END) AS id2, > count(CASE WHEN i.i_class_id=4 THEN 1 ELSE NULL END) AS id4, > count(CASE WHEN i.i_class_id=6 THEN 1 ELSE NULL END) AS id6, > count(CASE WHEN i.i_class_id=8 THEN 1 ELSE NULL END) AS id8, > count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10, > count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14, > count(CASE WHEN i.i_class_id=16 THEN 1 ELSE NULL END) AS id16 > FROM store_sales ss > INNER JOIN item i ON ss.ss_item_sk = i.i_item_sk > WHERE i.i_category IN ('Books') > AND ss.ss_customer_sk IS NOT NULL > GROUP BY ss.ss_customer_sk >
[jira] [Commented] (SPARK-10855) Add a JDBC dialect for Apache Derby
[ https://issues.apache.org/jira/browse/SPARK-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938334#comment-14938334 ] Rick Hillegas commented on SPARK-10855: --- I intend to create a pull request for this soon after running the tests and style checks. > Add a JDBC dialect for Apache Derby > > > Key: SPARK-10855 > URL: https://issues.apache.org/jira/browse/SPARK-10855 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: Rick Hillegas >Priority: Minor > > In particular, it would be good if the dialect could handle Derby's > user-defined types. The following script fails: > {noformat} > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > // the following script was used to create a Derby table > // which has a column of user-defined type: > // > // create type properties external name 'java.util.Properties' language java; > // > // create function systemProperties() returns properties > // language java parameter style java no sql > // external name 'java.lang.System.getProperties'; > // > // create table propertiesTable( props properties ); > // > // insert into propertiesTable values ( null ), ( systemProperties() ); > // > // select * from propertiesTable; > // cannot handle a table which has a column of type > java.sql.Types.JAVA_OBJECT: > // > // java.sql.SQLException: Unsupported type 2000 > // > val df = sqlContext.read.format("jdbc").options( > Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1", > "dbtable" -> "app.propertiesTable")).load() > // shutdown the Derby engine > val shutdown = sqlContext.read.format("jdbc").options( > Map("url" -> "jdbc:derby:;shutdown=true", > "dbtable" -> "")).load() > exit() > {noformat} > The inability to handle user-defined types probably affects other databases > besides Derby. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10770) SparkPlan.executeCollect/executeTake should return InternalRow rather than external Row
[ https://issues.apache.org/jira/browse/SPARK-10770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-10770. - Resolution: Fixed Fix Version/s: 1.6.0 > SparkPlan.executeCollect/executeTake should return InternalRow rather than > external Row > --- > > Key: SPARK-10770 > URL: https://issues.apache.org/jira/browse/SPARK-10770 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10851) Exception not failing R applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-10851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-10851: -- Assignee: Sun Rui > Exception not failing R applications (in yarn cluster mode) > --- > > Key: SPARK-10851 > URL: https://issues.apache.org/jira/browse/SPARK-10851 > Project: Spark > Issue Type: Bug > Components: SparkR, YARN >Affects Versions: 1.5.0, 1.5.1 >Reporter: Zsolt Tóth >Assignee: Sun Rui > Fix For: 1.6.0 > > > The bug is the R version of SPARK-7736. The R script fails with an exception > but the Yarn application status is SUCCEEDED. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10851) Exception not failing R applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-10851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-10851. --- Resolution: Fixed Fix Version/s: 1.6.0 Target Version/s: 1.6.0 > Exception not failing R applications (in yarn cluster mode) > --- > > Key: SPARK-10851 > URL: https://issues.apache.org/jira/browse/SPARK-10851 > Project: Spark > Issue Type: Bug > Components: SparkR, YARN >Affects Versions: 1.5.0, 1.5.1 >Reporter: Zsolt Tóth >Assignee: Sun Rui > Fix For: 1.6.0 > > > The bug is the R version of SPARK-7736. The R script fails with an exception > but the Yarn application status is SUCCEEDED. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior
[ https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938337#comment-14938337 ] Xin Ren commented on SPARK-9344: first time try, is it ok I work on this one? Thanks a lot. :P > Spark SQL documentation does not clarify INSERT INTO behavior > - > > Key: SPARK-9344 > URL: https://issues.apache.org/jira/browse/SPARK-9344 > Project: Spark > Issue Type: Documentation > Components: Documentation, SQL >Affects Versions: 1.4.1 >Reporter: Simeon Simeonov >Priority: Minor > Labels: documentation, sql > > The Spark SQL documentation does not address {{INSERT INTO}} behavior. The > section on Hive compatibility is misleading as it claims support for "the > vast majority of Hive features". The user mailing list has conflicting > information, including posts that claim {{INSERT INTO}} support targeting 1.0. > In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10857) SQL injection bug in JdbcDialect.getTableExistsQuery()
[ https://issues.apache.org/jira/browse/SPARK-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937732#comment-14937732 ] Suresh Thalamati commented on SPARK-10857: -- One issue I ran into with getSchema() call was even if Spark uses Java7 and above the JDBC driver versions customers using may not have support for getSchema. I tried on couple of databases I had , and got error one getSchema(). It is possible I have old drivers. postgresql-9.3-1101-jdbc4.jar /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/bin/java - Exception in thread "main" java.sql.SQLFeatureNotSupportedException: Method org.postgresql.jdbc4.Jdbc4Connection.getSchema() is not yet implemented. at org.postgresql.Driver.notImplemented(Driver.java:729) at org.postgresql.jdbc4.AbstractJdbc4Connection.getSchema(AbstractJdbc4Connection.java:239) My SQL : Implementation-Version: 5.1.17-SNAPSHOT /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/ Exception in thread "main" java.sql.SQLFeatureNotSupportedException: Not supported at com.mysql.jdbc.JDBC4Connection.getSchema(JDBC4Connection.java:253) ... > SQL injection bug in JdbcDialect.getTableExistsQuery() > -- > > Key: SPARK-10857 > URL: https://issues.apache.org/jira/browse/SPARK-10857 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Rick Hillegas >Priority: Minor > > All of the implementations of this method involve constructing a query by > concatenating boilerplate text with a user-supplied name. This looks like a > SQL injection bug to me. > A better solution would be to call java.sql.DatabaseMetaData.getTables() to > implement this method, using the catalog and schema which are available from > Connection.getCatalog() and Connection.getSchema(). This would not work on > Java 6 because Connection.getSchema() was introduced in Java 7. However, the > solution would work for more modern JVMs. Limiting the vulnerability to > obsolete JVMs would at least be an improvement over the current situation. > Java 6 has been end-of-lifed and is not an appropriate platform for users who > are concerned about security. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3862) MultiWayBroadcastInnerHashJoin
[ https://issues.apache.org/jira/browse/SPARK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-3862. -- Resolution: Won't Fix Closing this one for now since I think we can do something better with codegen without building specialized operators. > MultiWayBroadcastInnerHashJoin > -- > > Key: SPARK-3862 > URL: https://issues.apache.org/jira/browse/SPARK-3862 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > It is common to have a single fact table inner join many small dimension > tables. We can exploit this fact and create a MultiWayBroadcastInnerHashJoin > (or maybe just MultiwayDimensionJoin) operator that optimizes for this > pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3862) MultiWayBroadcastInnerHashJoin
[ https://issues.apache.org/jira/browse/SPARK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938243#comment-14938243 ] Reynold Xin commented on SPARK-3862: David, Thanks. Let's chat there. Since I created the ticket, I have new thoughts on how we can make something better with codegen, rather than writing specialized operators. > MultiWayBroadcastInnerHashJoin > -- > > Key: SPARK-3862 > URL: https://issues.apache.org/jira/browse/SPARK-3862 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > It is common to have a single fact table inner join many small dimension > tables. We can exploit this fact and create a MultiWayBroadcastInnerHashJoin > (or maybe just MultiwayDimensionJoin) operator that optimizes for this > pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10515) When killing executor, the pending replacement executors will be lost
[ https://issues.apache.org/jira/browse/SPARK-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936475#comment-14936475 ] Apache Spark commented on SPARK-10515: -- User 'KaiXinXiaoLei' has created a pull request for this issue: https://github.com/apache/spark/pull/8945 > When killing executor, the pending replacement executors will be lost > - > > Key: SPARK-10515 > URL: https://issues.apache.org/jira/browse/SPARK-10515 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.1 >Reporter: KaiXinXIaoLei > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10736) Use 1 for all ratings if $(ratingCol) = ""
[ https://issues.apache.org/jira/browse/SPARK-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-10736. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8937 [https://github.com/apache/spark/pull/8937] > Use 1 for all ratings if $(ratingCol) = "" > -- > > Key: SPARK-10736 > URL: https://issues.apache.org/jira/browse/SPARK-10736 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Priority: Minor > Fix For: 1.6.0 > > > For some implicit dataset, ratings may not exist in the training data. In > this case, we can assume all observed pairs to be positive and treat their > ratings as 1. This should happen when users set ratingCol to an empty string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10800) Flaky test: org.apache.spark.deploy.StandaloneDynamicAllocationSuite
[ https://issues.apache.org/jira/browse/SPARK-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936521#comment-14936521 ] Apache Spark commented on SPARK-10800: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/8946 > Flaky test: org.apache.spark.deploy.StandaloneDynamicAllocationSuite > > > Key: SPARK-10800 > URL: https://issues.apache.org/jira/browse/SPARK-10800 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Shixiong Zhu > Labels: flaky-test > > Saw several failures on master: > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/3622/HADOOP_PROFILE=hadoop-2.4,label=spark-test/testReport/junit/org.apache.spark.deploy/ > {code} > org.apache.spark.deploy.StandaloneDynamicAllocationSuite.dynamic allocation > default behavior > Failing for the past 1 build (Since Failed#3622 ) > Took 0.12 sec. > add description > Error Message > 1 did not equal 2 > Stacktrace > org.scalatest.exceptions.TestFailedException: 1 did not equal 2 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply$mcV$sp(StandaloneDynamicAllocationSuite.scala:78) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:73) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:73) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(StandaloneDynamicAllocationSuite.scala:33) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite.runTest(StandaloneDynamicAllocationSuite.scala:33) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite.org$scalatest$BeforeAndAfterAll$$super$run(StandaloneDynamicAllocationSuite.scala:33) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at >
[jira] [Updated] (SPARK-10869) Auto-normalization of semi-structured schema from a dataframe
[ https://issues.apache.org/jira/browse/SPARK-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Genini updated SPARK-10869: -- Target Version/s: (was: 1.5.1) > Auto-normalization of semi-structured schema from a dataframe > - > > Key: SPARK-10869 > URL: https://issues.apache.org/jira/browse/SPARK-10869 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 1.5.1 >Reporter: Julien Genini >Priority: Minor > Original Estimate: 4h > Remaining Estimate: 4h > > today, you can get a multi-depth schema from a semi-structured dataframe. > (XML, JSON, etc..) > Not so easy to deal in data warehousing where it's better to normalize the > data. > I propose an option to add when you get the schema (normalized, default False) > Then the returned json schema will contains the normalized path for each > field, and the list of the different node levels > df = sqlContext.read.json(jsonPath) > jsonLinearSchema = df.schema.jsonValue(normalized=True) > >> > {code} > {'fields': [{'fullPathName': 'SiteXML.BusinessDate', > > 'metadata': {}, > 'name': 'BusinessDate', > 'nullable': True, > 'type': 'string'}, > {'fullPathName': 'SiteXML.Site_List.Site.Id_Group', > 'metadata': {}, > 'name': 'Id_Group', > 'nullable': True, > 'type': 'string'}, > {'fullPathName': 'SiteXML.Site_List.Site.Id_Site', > 'metadata': {}, > 'name': 'Id_Site', > 'nullable': True, > 'type': 'string'}, > {'fullPathName': 'SiteXML.Site_List.Site.libelle', > 'metadata': {}, > 'name': 'libelle', > 'nullable': True, > 'type': 'string'}, > {'fullPathName': 'SiteXML.Site_List.Site.libelle_Group', > 'metadata': {}, > 'name': 'libelle_Group', > 'nullable': True, > 'type': 'string'}, > {'fullPathName': 'SiteXML.TimeStamp', > 'metadata': {}, > 'name': 'TimeStamp', > 'nullable': True, > 'type': 'string'}], > 'nodes': [{'fieldsFullPathName': ['SiteXML.BusinessDate', >'SiteXML.TimeStamp'], > 'fullPathName': 'SiteXML', > 'nbFields': 2}, >{'fieldsFullPathName': ['SiteXML.Site_List.Site.Id_Group', >'SiteXML.Site_List.Site.Id_Site', >'SiteXML.Site_List.Site.libelle', >'SiteXML.Site_List.Site.libelle_Group'], > 'fullPathName': 'SiteXML.Site_List.Site', > 'nbFields': 4}]} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6028) Provide an alternative RPC implementation based on the network transport module
[ https://issues.apache.org/jira/browse/SPARK-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936411#comment-14936411 ] Apache Spark commented on SPARK-6028: - User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/8944 > Provide an alternative RPC implementation based on the network transport > module > --- > > Key: SPARK-6028 > URL: https://issues.apache.org/jira/browse/SPARK-6028 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 1.6.0 > > > Network transport module implements a low level RPC interface. We can build a > new RPC implementation on top of that to replace Akka's. > Design document: > https://docs.google.com/document/d/1CF5G6rGVQMKSyV_QKo4D2M-x6rxz5x1Ew7aK3Uq6u8c/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10687) Discuss nonparametric survival analysis model
[ https://issues.apache.org/jira/browse/SPARK-10687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936544#comment-14936544 ] Yanbo Liang commented on SPARK-10687: - [~mengxr] Due to the Cox’s Proportional Hazard model is not easy to be implemented efficiently in Spark, I think the most common used non-parametric survival analysis model is Kaplan-Meier model. But it only give us an “average” view of the population rather than regression over covariates. I have update the design documents of SPARK-8518, please feel free to comment on the section of "Planning for the future". > Discuss nonparametric survival analysis model > - > > Key: SPARK-10687 > URL: https://issues.apache.org/jira/browse/SPARK-10687 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > Created this JIRA to discuss nonparametric survival models and feasibility to > implement them on Spark. Please also check the design doc posted on > SPARK-8518. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10736) Use 1 for all ratings if $(ratingCol) = ""
[ https://issues.apache.org/jira/browse/SPARK-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10736: -- Assignee: Yanbo Liang > Use 1 for all ratings if $(ratingCol) = "" > -- > > Key: SPARK-10736 > URL: https://issues.apache.org/jira/browse/SPARK-10736 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Yanbo Liang >Priority: Minor > Fix For: 1.6.0 > > > For some implicit dataset, ratings may not exist in the training data. In > this case, we can assume all observed pairs to be positive and treat their > ratings as 1. This should happen when users set ratingCol to an empty string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10811) Minimize array copying cost in Parquet converters
[ https://issues.apache.org/jira/browse/SPARK-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10811. Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8907 [https://github.com/apache/spark/pull/8907] > Minimize array copying cost in Parquet converters > - > > Key: SPARK-10811 > URL: https://issues.apache.org/jira/browse/SPARK-10811 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > Fix For: 1.6.0 > > > When converting Parquet {{Binary}} values to {{UTF8String}} and {{Decimal}} > values, there exists unnecessary array copying costs ({{Binary.getBytes()}}), > which can be eliminated for better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10268) Add @Since annotation to ml.tree
[ https://issues.apache.org/jira/browse/SPARK-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936485#comment-14936485 ] Hiroshi Takahashi commented on SPARK-10268: --- [~mengxr] Could you take a look? > Add @Since annotation to ml.tree > > > Key: SPARK-10268 > URL: https://issues.apache.org/jira/browse/SPARK-10268 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML >Reporter: Xiangrui Meng >Priority: Minor > Labels: starter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md
[ https://issues.apache.org/jira/browse/SPARK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10886: Assignee: (was: Apache Spark) > Random RDD creation example fix in MLlib statistics programming guide - > mllib-statistics.md > --- > > Key: SPARK-10886 > URL: https://issues.apache.org/jira/browse/SPARK-10886 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Jayant Shekhar >Priority: Minor > > Creating Random RDDs had the follow line in the example for Random Data > Generation in the MLlib statistics programming guide: > val u = normalRDD(sc, 100L, 10) > It should be : > val u = RandomRDDs.normalRDD(sc, 100L, 10) > It applies to both the Scala and Java examples. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior
[ https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938811#comment-14938811 ] Xin Ren commented on SPARK-9344: Hi I'll try to fix the docs, to get a sense how the commit process works, is it OK? Or could you please recommend some tickets that are easy so I can get started? Thanks a lot > Spark SQL documentation does not clarify INSERT INTO behavior > - > > Key: SPARK-9344 > URL: https://issues.apache.org/jira/browse/SPARK-9344 > Project: Spark > Issue Type: Documentation > Components: Documentation, SQL >Affects Versions: 1.4.1 >Reporter: Simeon Simeonov >Priority: Minor > Labels: documentation, sql > > The Spark SQL documentation does not address {{INSERT INTO}} behavior. The > section on Hive compatibility is misleading as it claims support for "the > vast majority of Hive features". The user mailing list has conflicting > information, including posts that claim {{INSERT INTO}} support targeting 1.0. > In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10887) Build HashRelation outside of HashJoinNode
Yin Huai created SPARK-10887: Summary: Build HashRelation outside of HashJoinNode Key: SPARK-10887 URL: https://issues.apache.org/jira/browse/SPARK-10887 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yin Huai Right now, HashJoinNode builds a HashRelation for the build side. We can take this process out. So, we can use HashJoinNode for both Broadcast join and shuffled join. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10888) Add as.DataFrame as a synonym for createDataFrame
Narine Kokhlikyan created SPARK-10888: - Summary: Add as.DataFrame as a synonym for createDataFrame Key: SPARK-10888 URL: https://issues.apache.org/jira/browse/SPARK-10888 Project: Spark Issue Type: Sub-task Reporter: Narine Kokhlikyan Priority: Minor as.DataFrame is more a R-style like signature. Also, I'd like to know if we could make the context, e.g. sqlContext global, so that we do not have to specify it as an argument, when we each time create a dataframe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10669) Link to each language's API in codetabs in ML docs: spark.mllib
[ https://issues.apache.org/jira/browse/SPARK-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938897#comment-14938897 ] Xin Ren commented on SPARK-10669: - may I have a try on this one? Thanks a lot > Link to each language's API in codetabs in ML docs: spark.mllib > --- > > Key: SPARK-10669 > URL: https://issues.apache.org/jira/browse/SPARK-10669 > Project: Spark > Issue Type: Documentation > Components: Documentation, MLlib >Reporter: Joseph K. Bradley > > In the Markdown docs for the spark.mllib Programming Guide, we have code > examples with codetabs for each language. We should link to each language's > API docs within the corresponding codetab, but we are inconsistent about > this. For an example of what we want to do, see the "ChiSqSelector" section > in > [https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/mllib-feature-extraction.md] > This JIRA is just for spark.mllib, not spark.ml -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md
[ https://issues.apache.org/jira/browse/SPARK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10886: Assignee: Apache Spark > Random RDD creation example fix in MLlib statistics programming guide - > mllib-statistics.md > --- > > Key: SPARK-10886 > URL: https://issues.apache.org/jira/browse/SPARK-10886 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Jayant Shekhar >Assignee: Apache Spark >Priority: Minor > > Creating Random RDDs had the follow line in the example for Random Data > Generation in the MLlib statistics programming guide: > val u = normalRDD(sc, 100L, 10) > It should be : > val u = RandomRDDs.normalRDD(sc, 100L, 10) > It applies to both the Scala and Java examples. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md
[ https://issues.apache.org/jira/browse/SPARK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938797#comment-14938797 ] Apache Spark commented on SPARK-10886: -- User 'jayantshekhar' has created a pull request for this issue: https://github.com/apache/spark/pull/8951 > Random RDD creation example fix in MLlib statistics programming guide - > mllib-statistics.md > --- > > Key: SPARK-10886 > URL: https://issues.apache.org/jira/browse/SPARK-10886 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Jayant Shekhar >Priority: Minor > > Creating Random RDDs had the follow line in the example for Random Data > Generation in the MLlib statistics programming guide: > val u = normalRDD(sc, 100L, 10) > It should be : > val u = RandomRDDs.normalRDD(sc, 100L, 10) > It applies to both the Scala and Java examples. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10665) Connect the local iterators with the planner
[ https://issues.apache.org/jira/browse/SPARK-10665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai reassigned SPARK-10665: Assignee: Yin Huai > Connect the local iterators with the planner > > > Key: SPARK-10665 > URL: https://issues.apache.org/jira/browse/SPARK-10665 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Yin Huai > > After creating these local iterators, we'd need to actually use them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior
[ https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938814#comment-14938814 ] Simeon Simeonov commented on SPARK-9344: /cc [~joshrosen] [~andrewor14] > Spark SQL documentation does not clarify INSERT INTO behavior > - > > Key: SPARK-9344 > URL: https://issues.apache.org/jira/browse/SPARK-9344 > Project: Spark > Issue Type: Documentation > Components: Documentation, SQL >Affects Versions: 1.4.1 >Reporter: Simeon Simeonov >Priority: Minor > Labels: documentation, sql > > The Spark SQL documentation does not address {{INSERT INTO}} behavior. The > section on Hive compatibility is misleading as it claims support for "the > vast majority of Hive features". The user mailing list has conflicting > information, including posts that claim {{INSERT INTO}} support targeting 1.0. > In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior
[ https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938825#comment-14938825 ] Simeon Simeonov commented on SPARK-9344: [~joshrosen] when I logged the bug I was using {{HiveContext}}. Given how many Spark SQL bugs are logged here, e.g., issues with view support, it does make sense for the SQL docs to become more reality-based. :) > Spark SQL documentation does not clarify INSERT INTO behavior > - > > Key: SPARK-9344 > URL: https://issues.apache.org/jira/browse/SPARK-9344 > Project: Spark > Issue Type: Documentation > Components: Documentation, SQL >Affects Versions: 1.4.1 >Reporter: Simeon Simeonov >Priority: Minor > Labels: documentation, sql > > The Spark SQL documentation does not address {{INSERT INTO}} behavior. The > section on Hive compatibility is misleading as it claims support for "the > vast majority of Hive features". The user mailing list has conflicting > information, including posts that claim {{INSERT INTO}} support targeting 1.0. > In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md
Jayant Shekhar created SPARK-10886: -- Summary: Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md Key: SPARK-10886 URL: https://issues.apache.org/jira/browse/SPARK-10886 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.5.0 Reporter: Jayant Shekhar Priority: Minor Creating Random RDDs had the follow line in the example for Random Data Generation in the MLlib statistics programming guide: val u = normalRDD(sc, 100L, 10) It should be : val u = RandomRDDs.normalRDD(sc, 100L, 10) It applies to both the Scala and Java examples. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior
[ https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938806#comment-14938806 ] Simeon Simeonov commented on SPARK-9344: Are you suggesting to fix the docs or the code? > Spark SQL documentation does not clarify INSERT INTO behavior > - > > Key: SPARK-9344 > URL: https://issues.apache.org/jira/browse/SPARK-9344 > Project: Spark > Issue Type: Documentation > Components: Documentation, SQL >Affects Versions: 1.4.1 >Reporter: Simeon Simeonov >Priority: Minor > Labels: documentation, sql > > The Spark SQL documentation does not address {{INSERT INTO}} behavior. The > section on Hive compatibility is misleading as it claims support for "the > vast majority of Hive features". The user mailing list has conflicting > information, including posts that claim {{INSERT INTO}} support targeting 1.0. > In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior
[ https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938839#comment-14938839 ] Josh Rosen commented on SPARK-9344: --- [~simeons], I just grepped through Spark's code base looking for {{INSERT INTO}} and found some test cases which use it: https://github.com/apache/spark/blob/418e5e4cbdaab87addb91ac0bb2245ff0213ac81/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala#L1015 Can you provide an example of a case which produces the parse error which would let us reproduce / triage the problem? > Spark SQL documentation does not clarify INSERT INTO behavior > - > > Key: SPARK-9344 > URL: https://issues.apache.org/jira/browse/SPARK-9344 > Project: Spark > Issue Type: Documentation > Components: Documentation, SQL >Affects Versions: 1.4.1 >Reporter: Simeon Simeonov >Priority: Minor > Labels: documentation, sql > > The Spark SQL documentation does not address {{INSERT INTO}} behavior. The > section on Hive compatibility is misleading as it claims support for "the > vast majority of Hive features". The user mailing list has conflicting > information, including posts that claim {{INSERT INTO}} support targeting 1.0. > In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior
[ https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938819#comment-14938819 ] Josh Rosen commented on SPARK-9344: --- [~simeons], are you using HiveContext or SQLContext? If Spark _does_ support {{INSERT INTO}} then I suspect that it's only via the HiveContext interface, not plain SQLContext. > Spark SQL documentation does not clarify INSERT INTO behavior > - > > Key: SPARK-9344 > URL: https://issues.apache.org/jira/browse/SPARK-9344 > Project: Spark > Issue Type: Documentation > Components: Documentation, SQL >Affects Versions: 1.4.1 >Reporter: Simeon Simeonov >Priority: Minor > Labels: documentation, sql > > The Spark SQL documentation does not address {{INSERT INTO}} behavior. The > section on Hive compatibility is misleading as it claims support for "the > vast majority of Hive features". The user mailing list has conflicting > information, including posts that claim {{INSERT INTO}} support targeting 1.0. > In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md
[ https://issues.apache.org/jira/browse/SPARK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938782#comment-14938782 ] Jayant Shekhar commented on SPARK-10886: Am in the process of PR for it. > Random RDD creation example fix in MLlib statistics programming guide - > mllib-statistics.md > --- > > Key: SPARK-10886 > URL: https://issues.apache.org/jira/browse/SPARK-10886 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Jayant Shekhar >Priority: Minor > > Creating Random RDDs had the follow line in the example for Random Data > Generation in the MLlib statistics programming guide: > val u = normalRDD(sc, 100L, 10) > It should be : > val u = RandomRDDs.normalRDD(sc, 100L, 10) > It applies to both the Scala and Java examples. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md
[ https://issues.apache.org/jira/browse/SPARK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938782#comment-14938782 ] Jayant Shekhar edited comment on SPARK-10886 at 9/30/15 8:27 PM: - Am in the process of creating a PR for it. was (Author: jayants): Am in the process of PR for it. > Random RDD creation example fix in MLlib statistics programming guide - > mllib-statistics.md > --- > > Key: SPARK-10886 > URL: https://issues.apache.org/jira/browse/SPARK-10886 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Jayant Shekhar >Priority: Minor > > Creating Random RDDs had the follow line in the example for Random Data > Generation in the MLlib statistics programming guide: > val u = normalRDD(sc, 100L, 10) > It should be : > val u = RandomRDDs.normalRDD(sc, 100L, 10) > It applies to both the Scala and Java examples. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior
[ https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938918#comment-14938918 ] Simeon Simeonov commented on SPARK-9344: [~joshrosen] Here is the reproducible test case you can try in {{spark-shell}}: {code} import org.apache.spark.sql.hive.HiveContext val ctx = sqlContext.asInstanceOf[HiveContext] import ctx.implicits._ (1 to 5).map(Tuple1.apply).toDF("w_int").write.save("test_data1") (6 to 9).map(Tuple1.apply).toDF("w_int").write.save("test_data2") ctx.sql("insert into table test_data1 select * from test_data2") {code} This fails with: {code} scala> ctx.sql("insert into table test_data1 select * from test_data2") 15/09/30 17:32:34 INFO ParseDriver: Parsing command: insert into table test_data1 select * from test_data2 15/09/30 17:32:34 INFO ParseDriver: Parse Completed 15/09/30 17:32:34 INFO HiveMetaStore: 0: get_table : db=default tbl=test_data1 15/09/30 17:32:34 INFO audit: ugi=sim ip=unknown-ip-addr cmd=get_table : db=default tbl=test_data1 org.apache.spark.sql.AnalysisException: no such table test_data1; line 1 pos 18 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:225) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:231) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:229) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:221) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:212) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:229) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:219) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:61) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:59) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) at scala.collection.immutable.List.foldLeft(List.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:59) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:51) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:51) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:933) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:933) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931) at org.apache.spark.sql.DataFrame.(DataFrame.scala:131) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:46) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:48) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:50) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:52) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:54) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:56) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:58) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:60) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:62) at $iwC$$iwC$$iwC$$iwC$$iwC.(:64) at $iwC$$iwC$$iwC$$iwC.(:66) at $iwC$$iwC$$iwC.(:68) at $iwC$$iwC.(:70) at $iwC.(:72) at (:74) at .(:78) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
[jira] [Commented] (SPARK-10263) Add @Since annotation to ml.param and ml.*
[ https://issues.apache.org/jira/browse/SPARK-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936630#comment-14936630 ] Hiroshi Takahashi commented on SPARK-10263: --- [~mengxr] Could you take a look? > Add @Since annotation to ml.param and ml.* > -- > > Key: SPARK-10263 > URL: https://issues.apache.org/jira/browse/SPARK-10263 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML >Reporter: Xiangrui Meng >Priority: Minor > Labels: starter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7869) Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns
[ https://issues.apache.org/jira/browse/SPARK-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7869: --- Assignee: (was: Apache Spark) > Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns > -- > > Key: SPARK-7869 > URL: https://issues.apache.org/jira/browse/SPARK-7869 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.3.0, 1.3.1 > Environment: Spark 1.3.1 >Reporter: Brad Willard >Priority: Minor > > Most of our tables load into dataframes just fine with postgres. However we > have a number of tables leveraging the JSONB datatype. Spark will error and > refuse to load this table. While asking for Spark to support JSONB might be a > tall order in the short term, it would be great if Spark would at least load > the table ignoring the columns it can't load or have it be an option. > pdf = sql_context.load(source="jdbc", url=url, dbtable="table_of_json") > Py4JJavaError: An error occurred while calling o41.load. > : java.sql.SQLException: Unsupported type > at org.apache.spark.sql.jdbc.JDBCRDD$.getCatalystType(JDBCRDD.scala:78) > at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:112) > at org.apache.spark.sql.jdbc.JDBCRelation.(JDBCRelation.scala:133) > at > org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:121) > at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219) > at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697) > at org.apache.spark.sql.SQLContext.load(SQLContext.scala:685) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7869) Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns
[ https://issues.apache.org/jira/browse/SPARK-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7869: --- Assignee: Apache Spark > Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns > -- > > Key: SPARK-7869 > URL: https://issues.apache.org/jira/browse/SPARK-7869 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.3.0, 1.3.1 > Environment: Spark 1.3.1 >Reporter: Brad Willard >Assignee: Apache Spark >Priority: Minor > > Most of our tables load into dataframes just fine with postgres. However we > have a number of tables leveraging the JSONB datatype. Spark will error and > refuse to load this table. While asking for Spark to support JSONB might be a > tall order in the short term, it would be great if Spark would at least load > the table ignoring the columns it can't load or have it be an option. > pdf = sql_context.load(source="jdbc", url=url, dbtable="table_of_json") > Py4JJavaError: An error occurred while calling o41.load. > : java.sql.SQLException: Unsupported type > at org.apache.spark.sql.jdbc.JDBCRDD$.getCatalystType(JDBCRDD.scala:78) > at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:112) > at org.apache.spark.sql.jdbc.JDBCRelation.(JDBCRelation.scala:133) > at > org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:121) > at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219) > at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697) > at org.apache.spark.sql.SQLContext.load(SQLContext.scala:685) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7869) Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns
[ https://issues.apache.org/jira/browse/SPARK-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936695#comment-14936695 ] Apache Spark commented on SPARK-7869: - User '0x0FFF' has created a pull request for this issue: https://github.com/apache/spark/pull/8948 > Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns > -- > > Key: SPARK-7869 > URL: https://issues.apache.org/jira/browse/SPARK-7869 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.3.0, 1.3.1 > Environment: Spark 1.3.1 >Reporter: Brad Willard >Priority: Minor > > Most of our tables load into dataframes just fine with postgres. However we > have a number of tables leveraging the JSONB datatype. Spark will error and > refuse to load this table. While asking for Spark to support JSONB might be a > tall order in the short term, it would be great if Spark would at least load > the table ignoring the columns it can't load or have it be an option. > pdf = sql_context.load(source="jdbc", url=url, dbtable="table_of_json") > Py4JJavaError: An error occurred while calling o41.load. > : java.sql.SQLException: Unsupported type > at org.apache.spark.sql.jdbc.JDBCRDD$.getCatalystType(JDBCRDD.scala:78) > at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:112) > at org.apache.spark.sql.jdbc.JDBCRelation.(JDBCRelation.scala:133) > at > org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:121) > at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219) > at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697) > at org.apache.spark.sql.SQLContext.load(SQLContext.scala:685) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10880) Hive module build test failed
Jean-Baptiste Onofré created SPARK-10880: Summary: Hive module build test failed Key: SPARK-10880 URL: https://issues.apache.org/jira/browse/SPARK-10880 Project: Spark Issue Type: Bug Components: Tests Reporter: Jean-Baptiste Onofré On master, sql/hive module tests fail. The reason is that the bin/spark-submit is not found. The impacted test are: - SPARK-8468 - SPARK-8020 - SPARK-8489 - SPARK-9757 I gonna take a look to fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9776) Another instance of Derby may have already booted the database
[ https://issues.apache.org/jira/browse/SPARK-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936634#comment-14936634 ] Apache Spark commented on SPARK-9776: - User 'KaiXinXiaoLei' has created a pull request for this issue: https://github.com/apache/spark/pull/8947 > Another instance of Derby may have already booted the database > --- > > Key: SPARK-9776 > URL: https://issues.apache.org/jira/browse/SPARK-9776 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: Mac Yosemite, spark-1.5.0 >Reporter: Sudhakar Thota > Attachments: SPARK-9776-FL1.rtf > > > val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) results in > error. Though the same works for spark-1.4.1. > Caused by: ERROR XSDB6: Another instance of Derby may have already booted the > database -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10881) Unable to use custom log4j appender in spark executor
[ https://issues.apache.org/jira/browse/SPARK-10881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Moravek updated SPARK-10881: -- Description: In CoarseGrainedExecutorBackend, log4j is initialized, before userclasspath gets registered: https://github.com/apache/spark/blob/v1.3.1/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L126 In order to use custom appender, one has to distribute it using `spark-submit --files` and set it via spark.executor.extraClassPath was: In CoarseGrainedExecutorBackend, log4j is initialized, before userclasspath gets registered: https://github.com/apache/spark/blob/v1.3.1/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L126 In order to use custom appender, one have to distribute it using `spark-submit --files` and set it via spark.executor.extraClassPath > Unable to use custom log4j appender in spark executor > - > > Key: SPARK-10881 > URL: https://issues.apache.org/jira/browse/SPARK-10881 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.1 >Reporter: David Moravek >Priority: Minor > > In CoarseGrainedExecutorBackend, log4j is initialized, before userclasspath > gets registered: > https://github.com/apache/spark/blob/v1.3.1/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L126 > In order to use custom appender, one has to distribute it using `spark-submit > --files` and set it via spark.executor.extraClassPath -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10881) Unable to use custom log4j appender in spark executor
David Moravek created SPARK-10881: - Summary: Unable to use custom log4j appender in spark executor Key: SPARK-10881 URL: https://issues.apache.org/jira/browse/SPARK-10881 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Reporter: David Moravek Priority: Minor In CoarseGrainedExecutorBackend, log4j is initialized, before userclasspath gets registered: https://github.com/apache/spark/blob/v1.3.1/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L126 In order to use custom appender, one have to distribute it using `spark-submit --files` and set it via spark.executor.extraClassPath -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10882) Add the ability to connect to secured mqtt brokers
Alexandre Touret created SPARK-10882: Summary: Add the ability to connect to secured mqtt brokers Key: SPARK-10882 URL: https://issues.apache.org/jira/browse/SPARK-10882 Project: Spark Issue Type: Improvement Components: Input/Output Affects Versions: 1.5.0, 1.5.1 Reporter: Alexandre Touret Currently, there's no way to connect to secured MQTT brokers. For example, I'm trying to subscribe to a MQTT topic hosted on a single RABBITMQ instance. I can't provide the credentials during the connection. Furthermore, I saw in the source code (https://github.com/apache/spark/blob/7478c8b66d6a2b1179f20c38b49e27e37b0caec3/external/mqtt/src/main/scala/org/apache/spark/streaming/mqtt/MQTTInputDStream.scala#L50) that credentials are never initialized. It could be nice to add this ability to spark Regards -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10865) [Spark SQL] [UDF] the ceil/ceiling function got wrong return value type
[ https://issues.apache.org/jira/browse/SPARK-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936799#comment-14936799 ] Alexey Grishchenko commented on SPARK-10865: I don't really think that _floor()_ and _ceiling()_ should return integer data type. For instance, in [Oracle|http://docs.oracle.com/cd/E11882_01/server.112/e10592/functions067.htm#SQLRF00643], [Postgres|http://www.postgresql.org/docs/9.5/static/functions-math.html] and [MS SQL|https://msdn.microsoft.com/en-us/library/ms178531.aspx] these functions return the same data type provided as input > [Spark SQL] [UDF] the ceil/ceiling function got wrong return value type > --- > > Key: SPARK-10865 > URL: https://issues.apache.org/jira/browse/SPARK-10865 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yi Zhou > > As per ceil/ceiling definition,it should get BIGINT return value > -ceil(DOUBLE a), ceiling(DOUBLE a) > -Returns the minimum BIGINT value that is equal to or greater than a. > But in current Spark implementation, it got wrong value type. > e.g., > select ceil(2642.12) from udf_test_web_sales limit 1; > 2643.0 > In hive implementation, it got return value type like below: > hive> select ceil(2642.12) from udf_test_web_sales limit 1; > OK > 2643 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10879) spark on yarn support priority option
[ https://issues.apache.org/jira/browse/SPARK-10879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936813#comment-14936813 ] Marcelo Vanzin commented on SPARK-10879: Is this related to YARN-1963? It's nice to reference the source of the functionality that is being implemented. > spark on yarn support priority option > - > > Key: SPARK-10879 > URL: https://issues.apache.org/jira/browse/SPARK-10879 > Project: Spark > Issue Type: Improvement > Components: Spark Submit, YARN >Reporter: Yun Zhao > > Add a YARN-only option to spark-submit: *--priority PRIORITY* .The priority > of your YARN application (Default: 0). > Add a property: *spark.yarn.priority* -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10887) Build HashedRelation outside of HashJoinNode
[ https://issues.apache.org/jira/browse/SPARK-10887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10887: Assignee: Apache Spark (was: Yin Huai) > Build HashedRelation outside of HashJoinNode > > > Key: SPARK-10887 > URL: https://issues.apache.org/jira/browse/SPARK-10887 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Apache Spark > > Right now, HashJoinNode builds a HashRelation for the build side. We can take > this process out. So, we can use HashJoinNode for both Broadcast join and > shuffled join. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10887) Build HashedRelation outside of HashJoinNode
[ https://issues.apache.org/jira/browse/SPARK-10887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939047#comment-14939047 ] Apache Spark commented on SPARK-10887: -- User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/8953 > Build HashedRelation outside of HashJoinNode > > > Key: SPARK-10887 > URL: https://issues.apache.org/jira/browse/SPARK-10887 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai > > Right now, HashJoinNode builds a HashRelation for the build side. We can take > this process out. So, we can use HashJoinNode for both Broadcast join and > shuffled join. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10891) Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka
Burak Yavuz created SPARK-10891: --- Summary: Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka Key: SPARK-10891 URL: https://issues.apache.org/jira/browse/SPARK-10891 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Burak Yavuz There is support for message handler in Direct Kafka Stream, which allows arbitrary T to be the output of the stream instead of Array[Byte]. This is a very useful function, therefore should exist in Kinesis as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10560) Make StreamingLogisticRegressionWithSGD Python API equals with Scala one
[ https://issues.apache.org/jira/browse/SPARK-10560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939067#comment-14939067 ] Bryan Cutler commented on SPARK-10560: -- Hi [~yanboliang], I'd be happy to do this unless you have already started? > Make StreamingLogisticRegressionWithSGD Python API equals with Scala one > > > Key: SPARK-10560 > URL: https://issues.apache.org/jira/browse/SPARK-10560 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Reporter: Yanbo Liang >Priority: Minor > > StreamingLogisticRegressionWithSGD Python API lacks of some parameters > compared with Scala one, here we make them equality. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10888) Add as.DataFrame as a synonym for createDataFrame
[ https://issues.apache.org/jira/browse/SPARK-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10888: Assignee: (was: Apache Spark) > Add as.DataFrame as a synonym for createDataFrame > - > > Key: SPARK-10888 > URL: https://issues.apache.org/jira/browse/SPARK-10888 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Narine Kokhlikyan >Priority: Minor > > as.DataFrame is more a R-style like signature. > Also, I'd like to know if we could make the context, e.g. sqlContext global, > so that we do not have to specify it as an argument, when we each time create > a dataframe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10888) Add as.DataFrame as a synonym for createDataFrame
[ https://issues.apache.org/jira/browse/SPARK-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938923#comment-14938923 ] Apache Spark commented on SPARK-10888: -- User 'NarineK' has created a pull request for this issue: https://github.com/apache/spark/pull/8952 > Add as.DataFrame as a synonym for createDataFrame > - > > Key: SPARK-10888 > URL: https://issues.apache.org/jira/browse/SPARK-10888 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Narine Kokhlikyan >Priority: Minor > > as.DataFrame is more a R-style like signature. > Also, I'd like to know if we could make the context, e.g. sqlContext global, > so that we do not have to specify it as an argument, when we each time create > a dataframe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10889) Upgrade Kinesis Client Library
Avrohom Katz created SPARK-10889: Summary: Upgrade Kinesis Client Library Key: SPARK-10889 URL: https://issues.apache.org/jira/browse/SPARK-10889 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.4.2, 1.5.2, 1.6.0 Reporter: Avrohom Katz Priority: Minor Kinesis Client Library added a custom cloudwatch metric in 1.3.0 called MillisBehindLatest. This is very important for capacity planning and alerting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9617) Implement json_tuple
[ https://issues.apache.org/jira/browse/SPARK-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-9617. - Resolution: Fixed Assignee: Nathan Howell Fix Version/s: 1.6.0 This issue has been resolved by https://github.com/apache/spark/pull/7946/. > Implement json_tuple > > > Key: SPARK-9617 > URL: https://issues.apache.org/jira/browse/SPARK-9617 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Nathan Howell >Assignee: Nathan Howell >Priority: Minor > Fix For: 1.6.0 > > > Provide a native Spark implementation for {{json_tuple}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10887) Build HashedRelation outside of HashJoinNode
[ https://issues.apache.org/jira/browse/SPARK-10887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai reassigned SPARK-10887: Assignee: Yin Huai > Build HashedRelation outside of HashJoinNode > > > Key: SPARK-10887 > URL: https://issues.apache.org/jira/browse/SPARK-10887 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai > > Right now, HashJoinNode builds a HashRelation for the build side. We can take > this process out. So, we can use HashJoinNode for both Broadcast join and > shuffled join. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10887) Build HashedRelation outside of HashJoinNode
[ https://issues.apache.org/jira/browse/SPARK-10887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10887: Assignee: Yin Huai (was: Apache Spark) > Build HashedRelation outside of HashJoinNode > > > Key: SPARK-10887 > URL: https://issues.apache.org/jira/browse/SPARK-10887 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai > > Right now, HashJoinNode builds a HashRelation for the build side. We can take > this process out. So, we can use HashJoinNode for both Broadcast join and > shuffled join. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10790) Dynamic Allocation does not request any executors if first stage needs less than or equal to spark.dynamicAllocation.initialExecutors
[ https://issues.apache.org/jira/browse/SPARK-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939060#comment-14939060 ] Jonathan Kelly commented on SPARK-10790: Thank you for the explanation and for such a quick fix, [~jerryshao]! > Dynamic Allocation does not request any executors if first stage needs less > than or equal to spark.dynamicAllocation.initialExecutors > - > > Key: SPARK-10790 > URL: https://issues.apache.org/jira/browse/SPARK-10790 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 >Reporter: Jonathan Kelly >Assignee: Saisai Shao > Fix For: 1.5.2, 1.6.0 > > > If you set spark.dynamicAllocation.initialExecutors > 0 (or > spark.dynamicAllocation.minExecutors, since > spark.dynamicAllocation.initialExecutors defaults to > spark.dynamicAllocation.minExecutors), and the number of tasks in the first > stage of your job is less than or equal to this min/init number of executors, > dynamic allocation won't actually request any executors and will just hang > indefinitely with the warning "Initial job has not accepted any resources; > check your cluster UI to ensure that workers are registered and have > sufficient resources". > The cause appears to be that ExecutorAllocationManager does not request any > executors while the application is still initializing, but it still sets the > initial value of numExecutorsTarget to > spark.dynamicAllocation.initialExecutors. Once the job is running and has > submitted its first task, if the first task does not need more than > spark.dynamicAllocation.initialExecutors, > ExecutorAllocationManager.updateAndSyncNumExecutorsTarget() does not think > that it needs to request any executors, so it doesn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10892) Join with Data Frame returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-10892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ofer Mendelevitch updated SPARK-10892: -- Attachment: data.json Data file to run with the code, for reproducing. > Join with Data Frame returns wrong results > -- > > Key: SPARK-10892 > URL: https://issues.apache.org/jira/browse/SPARK-10892 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Ofer Mendelevitch >Priority: Critical > Attachments: data.json > > > I'm attaching a simplified reproducible example of the problem: > 1. Loading a JSON file from HDFS as a Data Frame > 2. Creating 3 data frames: PRCP, TMIN, TMAX > 3. Joining the data frames together. Each of those has a column "value" with > the same name, so renaming them after the join. > 4. The output seems incorrect; the first column has the correct values, but > the two other columns seem to have a copy of the values from the first column. > Here's the sample code: > import org.apache.spark.sql._ > val sqlc = new SQLContext(sc) > val weather = sqlc.read.format("json").load("data.json") > val prcp = weather.filter("metric = 'PRCP'").as("prcp").cache() > val tmin = weather.filter("metric = 'TMIN'").as("tmin").cache() > val tmax = weather.filter("metric = 'TMAX'").as("tmax").cache() > prcp.filter("year=2012 and month=10").show() > tmin.filter("year=2012 and month=10").show() > tmax.filter("year=2012 and month=10").show() > val out = (prcp.join(tmin, "date_str").join(tmax, "date_str") > .select(prcp("year"), prcp("month"), prcp("day"), prcp("date_str"), > prcp("value").alias("PRCP"), tmin("value").alias("TMIN"), > tmax("value").alias("TMAX")) ) > out.filter("year=2012 and month=10").show() > The output is: > ++---+--+-+---+-++ > |date_str|day|metric|month|station|value|year| > ++---+--+-+---+-++ > |20121001| 1| PRCP| 10|USW00023272|0|2012| > |20121002| 2| PRCP| 10|USW00023272|0|2012| > |20121003| 3| PRCP| 10|USW00023272|0|2012| > |20121004| 4| PRCP| 10|USW00023272|0|2012| > |20121005| 5| PRCP| 10|USW00023272|0|2012| > |20121006| 6| PRCP| 10|USW00023272|0|2012| > |20121007| 7| PRCP| 10|USW00023272|0|2012| > |20121008| 8| PRCP| 10|USW00023272|0|2012| > |20121009| 9| PRCP| 10|USW00023272|0|2012| > |20121010| 10| PRCP| 10|USW00023272|0|2012| > |20121011| 11| PRCP| 10|USW00023272|3|2012| > |20121012| 12| PRCP| 10|USW00023272|0|2012| > |20121013| 13| PRCP| 10|USW00023272|0|2012| > |20121014| 14| PRCP| 10|USW00023272|0|2012| > |20121015| 15| PRCP| 10|USW00023272|0|2012| > |20121016| 16| PRCP| 10|USW00023272|0|2012| > |20121017| 17| PRCP| 10|USW00023272|0|2012| > |20121018| 18| PRCP| 10|USW00023272|0|2012| > |20121019| 19| PRCP| 10|USW00023272|0|2012| > |20121020| 20| PRCP| 10|USW00023272|0|2012| > ++---+--+-+---+-+——+ > ++---+--+-+---+-++ > |date_str|day|metric|month|station|value|year| > ++---+--+-+---+-++ > |20121001| 1| TMIN| 10|USW00023272| 139|2012| > |20121002| 2| TMIN| 10|USW00023272| 178|2012| > |20121003| 3| TMIN| 10|USW00023272| 144|2012| > |20121004| 4| TMIN| 10|USW00023272| 144|2012| > |20121005| 5| TMIN| 10|USW00023272| 139|2012| > |20121006| 6| TMIN| 10|USW00023272| 128|2012| > |20121007| 7| TMIN| 10|USW00023272| 122|2012| > |20121008| 8| TMIN| 10|USW00023272| 122|2012| > |20121009| 9| TMIN| 10|USW00023272| 139|2012| > |20121010| 10| TMIN| 10|USW00023272| 128|2012| > |20121011| 11| TMIN| 10|USW00023272| 122|2012| > |20121012| 12| TMIN| 10|USW00023272| 117|2012| > |20121013| 13| TMIN| 10|USW00023272| 122|2012| > |20121014| 14| TMIN| 10|USW00023272| 128|2012| > |20121015| 15| TMIN| 10|USW00023272| 128|2012| > |20121016| 16| TMIN| 10|USW00023272| 156|2012| > |20121017| 17| TMIN| 10|USW00023272| 139|2012| > |20121018| 18| TMIN| 10|USW00023272| 161|2012| > |20121019| 19| TMIN| 10|USW00023272| 133|2012| > |20121020| 20| TMIN| 10|USW00023272| 122|2012| > ++---+--+-+---+-+——+ > ++---+--+-+---+-++ > |date_str|day|metric|month|station|value|year| > ++---+--+-+---+-++ > |20121001| 1| TMAX| 10|USW00023272| 322|2012| > |20121002| 2| TMAX| 10|USW00023272| 344|2012| > |20121003| 3| TMAX| 10|USW00023272| 222|2012| > |20121004| 4| TMAX| 10|USW00023272| 189|2012| > |20121005| 5| TMAX| 10|USW00023272| 194|2012| > |20121006| 6| TMAX| 10|USW00023272| 200|2012| >
[jira] [Updated] (SPARK-10887) Build HashedRelation outside of HashJoinNode
[ https://issues.apache.org/jira/browse/SPARK-10887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-10887: - Summary: Build HashedRelation outside of HashJoinNode (was: Build HashRelation outside of HashJoinNode) > Build HashedRelation outside of HashJoinNode > > > Key: SPARK-10887 > URL: https://issues.apache.org/jira/browse/SPARK-10887 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > Right now, HashJoinNode builds a HashRelation for the build side. We can take > this process out. So, we can use HashJoinNode for both Broadcast join and > shuffled join. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10888) Add as.DataFrame as a synonym for createDataFrame
[ https://issues.apache.org/jira/browse/SPARK-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10888: Assignee: Apache Spark > Add as.DataFrame as a synonym for createDataFrame > - > > Key: SPARK-10888 > URL: https://issues.apache.org/jira/browse/SPARK-10888 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Narine Kokhlikyan >Assignee: Apache Spark >Priority: Minor > > as.DataFrame is more a R-style like signature. > Also, I'd like to know if we could make the context, e.g. sqlContext global, > so that we do not have to specify it as an argument, when we each time create > a dataframe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual
[ https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10561: --- Description: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. was: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. > Provide tooling for auto-generating Spark SQL reference manual > -- > > Key: SPARK-10561 > URL: https://issues.apache.org/jira/browse/SPARK-10561 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Ted Yu > > Here is the discussion thread: > http://search-hadoop.com/m/q3RTtcD20F1o62xE > Richard Hillegas made the following suggestion: > A machine-generated BNF, however, is easy to imagine. But perhaps not so easy > to implement. Spark's SQL grammar is implemented in Scala, extending the DSL > support provided by the Scala language. I am new to programming in Scala, so > I don't know whether the Scala ecosystem provides any good tools for > reverse-engineering a BNF from a class which extends > scala.util.parsing.combinator.syntactical.StandardTokenParsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10890) "Column count does not match; SQL statement:" error in JDBCWriteSuite
Rick Hillegas created SPARK-10890: - Summary: "Column count does not match; SQL statement:" error in JDBCWriteSuite Key: SPARK-10890 URL: https://issues.apache.org/jira/browse/SPARK-10890 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.5.0 Reporter: Rick Hillegas I get the following error when I run the following test... mvn -Dhadoop.version=2.4.0 -DwildcardSuites=org.apache.spark.sql.jdbc.JDBCWriteSuite test {noformat} JDBCWriteSuite: 13:22:15.603 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13:22:16.506 WARN org.apache.spark.metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. - Basic CREATE - CREATE with overwrite - CREATE then INSERT to append - CREATE then INSERT to truncate 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 23.0 (TID 31) org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement: INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183] at org.h2.message.DbException.getJdbcSQLException(DbException.java:345) at org.h2.message.DbException.get(DbException.java:179) at org.h2.message.DbException.get(DbException.java:155) at org.h2.message.DbException.get(DbException.java:144) at org.h2.command.dml.Insert.prepare(Insert.java:265) at org.h2.command.Parser.prepareCommand(Parser.java:247) at org.h2.engine.Session.prepareLocal(Session.java:446) at org.h2.engine.Session.prepareCommand(Session.java:388) at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189) at org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72) at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 in stage 23.0 (TID 32) org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement: INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183] at org.h2.message.DbException.getJdbcSQLException(DbException.java:345) at org.h2.message.DbException.get(DbException.java:179) at org.h2.message.DbException.get(DbException.java:155) at org.h2.message.DbException.get(DbException.java:144) at org.h2.command.dml.Insert.prepare(Insert.java:265) at org.h2.command.Parser.prepareCommand(Parser.java:247) at org.h2.engine.Session.prepareLocal(Session.java:446) at org.h2.engine.Session.prepareCommand(Session.java:388) at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189) at org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72) at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) at
[jira] [Commented] (SPARK-10889) Upgrade Kinesis Client Library
[ https://issues.apache.org/jira/browse/SPARK-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939023#comment-14939023 ] Burak Yavuz commented on SPARK-10889: - In addition, KCL 1.4.0 supports de-aggregation of records. > Upgrade Kinesis Client Library > -- > > Key: SPARK-10889 > URL: https://issues.apache.org/jira/browse/SPARK-10889 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.4.2, 1.5.2, 1.6.0 >Reporter: Avrohom Katz >Priority: Minor > > Kinesis Client Library added a custom cloudwatch metric in 1.3.0 called > MillisBehindLatest. This is very important for capacity planning and alerting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10337) Views are broken
[ https://issues.apache.org/jira/browse/SPARK-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-10337: - Assignee: Wenchen Fan > Views are broken > > > Key: SPARK-10337 > URL: https://issues.apache.org/jira/browse/SPARK-10337 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Michael Armbrust >Assignee: Wenchen Fan >Priority: Critical > > I haven't dug into this yet... but it seems like this should work: > This works: > {code} > SELECT * FROM 100milints > {code} > This seems to work: > {code} > CREATE VIEW testView AS SELECT * FROM 100milints > {code} > This fails: > {code} > SELECT * FROM testView > org.apache.spark.sql.AnalysisException: cannot resolve '100milints.col' given > input columns id; line 1 pos 7 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:53) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:292) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:290) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:290) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:279) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:290) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:108) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:122) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:122) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:126) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at
[jira] [Created] (SPARK-10892) Join with Data Frame returns wrong results
Ofer Mendelevitch created SPARK-10892: - Summary: Join with Data Frame returns wrong results Key: SPARK-10892 URL: https://issues.apache.org/jira/browse/SPARK-10892 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0, 1.4.1 Reporter: Ofer Mendelevitch Priority: Critical I'm attaching a simplified reproducible example of the problem: 1. Loading a JSON file from HDFS as a Data Frame 2. Creating 3 data frames: PRCP, TMIN, TMAX 3. Joining the data frames together. Each of those has a column "value" with the same name, so renaming them after the join. 4. The output seems incorrect; the first column has the correct values, but the two other columns seem to have a copy of the values from the first column. Here's the sample code: import org.apache.spark.sql._ val sqlc = new SQLContext(sc) val weather = sqlc.read.format("json").load("data.json") val prcp = weather.filter("metric = 'PRCP'").as("prcp").cache() val tmin = weather.filter("metric = 'TMIN'").as("tmin").cache() val tmax = weather.filter("metric = 'TMAX'").as("tmax").cache() prcp.filter("year=2012 and month=10").show() tmin.filter("year=2012 and month=10").show() tmax.filter("year=2012 and month=10").show() val out = (prcp.join(tmin, "date_str").join(tmax, "date_str") .select(prcp("year"), prcp("month"), prcp("day"), prcp("date_str"), prcp("value").alias("PRCP"), tmin("value").alias("TMIN"), tmax("value").alias("TMAX")) ) out.filter("year=2012 and month=10").show() The output is: ++---+--+-+---+-++ |date_str|day|metric|month|station|value|year| ++---+--+-+---+-++ |20121001| 1| PRCP| 10|USW00023272|0|2012| |20121002| 2| PRCP| 10|USW00023272|0|2012| |20121003| 3| PRCP| 10|USW00023272|0|2012| |20121004| 4| PRCP| 10|USW00023272|0|2012| |20121005| 5| PRCP| 10|USW00023272|0|2012| |20121006| 6| PRCP| 10|USW00023272|0|2012| |20121007| 7| PRCP| 10|USW00023272|0|2012| |20121008| 8| PRCP| 10|USW00023272|0|2012| |20121009| 9| PRCP| 10|USW00023272|0|2012| |20121010| 10| PRCP| 10|USW00023272|0|2012| |20121011| 11| PRCP| 10|USW00023272|3|2012| |20121012| 12| PRCP| 10|USW00023272|0|2012| |20121013| 13| PRCP| 10|USW00023272|0|2012| |20121014| 14| PRCP| 10|USW00023272|0|2012| |20121015| 15| PRCP| 10|USW00023272|0|2012| |20121016| 16| PRCP| 10|USW00023272|0|2012| |20121017| 17| PRCP| 10|USW00023272|0|2012| |20121018| 18| PRCP| 10|USW00023272|0|2012| |20121019| 19| PRCP| 10|USW00023272|0|2012| |20121020| 20| PRCP| 10|USW00023272|0|2012| ++---+--+-+---+-+——+ ++---+--+-+---+-++ |date_str|day|metric|month|station|value|year| ++---+--+-+---+-++ |20121001| 1| TMIN| 10|USW00023272| 139|2012| |20121002| 2| TMIN| 10|USW00023272| 178|2012| |20121003| 3| TMIN| 10|USW00023272| 144|2012| |20121004| 4| TMIN| 10|USW00023272| 144|2012| |20121005| 5| TMIN| 10|USW00023272| 139|2012| |20121006| 6| TMIN| 10|USW00023272| 128|2012| |20121007| 7| TMIN| 10|USW00023272| 122|2012| |20121008| 8| TMIN| 10|USW00023272| 122|2012| |20121009| 9| TMIN| 10|USW00023272| 139|2012| |20121010| 10| TMIN| 10|USW00023272| 128|2012| |20121011| 11| TMIN| 10|USW00023272| 122|2012| |20121012| 12| TMIN| 10|USW00023272| 117|2012| |20121013| 13| TMIN| 10|USW00023272| 122|2012| |20121014| 14| TMIN| 10|USW00023272| 128|2012| |20121015| 15| TMIN| 10|USW00023272| 128|2012| |20121016| 16| TMIN| 10|USW00023272| 156|2012| |20121017| 17| TMIN| 10|USW00023272| 139|2012| |20121018| 18| TMIN| 10|USW00023272| 161|2012| |20121019| 19| TMIN| 10|USW00023272| 133|2012| |20121020| 20| TMIN| 10|USW00023272| 122|2012| ++---+--+-+---+-+——+ ++---+--+-+---+-++ |date_str|day|metric|month|station|value|year| ++---+--+-+---+-++ |20121001| 1| TMAX| 10|USW00023272| 322|2012| |20121002| 2| TMAX| 10|USW00023272| 344|2012| |20121003| 3| TMAX| 10|USW00023272| 222|2012| |20121004| 4| TMAX| 10|USW00023272| 189|2012| |20121005| 5| TMAX| 10|USW00023272| 194|2012| |20121006| 6| TMAX| 10|USW00023272| 200|2012| |20121007| 7| TMAX| 10|USW00023272| 167|2012| |20121008| 8| TMAX| 10|USW00023272| 183|2012| |20121009| 9| TMAX| 10|USW00023272| 194|2012| |20121010| 10| TMAX| 10|USW00023272| 183|2012| |20121011| 11| TMAX| 10|USW00023272| 139|2012| |20121012| 12| TMAX| 10|USW00023272| 161|2012| |20121013| 13| TMAX| 10|USW00023272| 211|2012| |20121014| 14| TMAX| 10|USW00023272| 189|2012| |20121015| 15|
[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior
[ https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939130#comment-14939130 ] Josh Rosen commented on SPARK-9344: --- I think this is the relevant part of the stacktrace: {code} org.apache.spark.sql.AnalysisException: no such table test_data1; {code} The Data Sources {{save}} method does not automatically register / create tables or temp tables. Did you mean to use {{saveAsTable}} instead? See https://spark.apache.org/docs/latest/sql-programming-guide.html#saving-to-persistent-tables > Spark SQL documentation does not clarify INSERT INTO behavior > - > > Key: SPARK-9344 > URL: https://issues.apache.org/jira/browse/SPARK-9344 > Project: Spark > Issue Type: Documentation > Components: Documentation, SQL >Affects Versions: 1.4.1 >Reporter: Simeon Simeonov >Priority: Minor > Labels: documentation, sql > > The Spark SQL documentation does not address {{INSERT INTO}} behavior. The > section on Hive compatibility is misleading as it claims support for "the > vast majority of Hive features". The user mailing list has conflicting > information, including posts that claim {{INSERT INTO}} support targeting 1.0. > In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10893) Lag Analytic function broken
[ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jo Desmet updated SPARK-10893: -- Description: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); was: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); > Lag Analytic function broken > > > Key: SPARK-10893 > URL: https://issues.apache.org/jira/browse/SPARK-10893 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.5.0 > Environment: Spark Standalone Cluster on Linux >Reporter: Jo Desmet > > Trying to aggregate with the LAG Analytic function gives the wrong result. In > my testcase it was always giving the fixed value '103079215105' when I tried > to run on an integer. > Note that this only happens on Spark 1.5.0, and only when running in cluster > mode. > It works fine when running on Spark 1.4.1, or when running in local mode. > I did not test on a yarn cluster. > I did not test other analytic aggregates. > Input Jason: > {"VAA":"A", "VBB":1} > {"VAA":"B", "VBB":-1} > {"VAA":"C", "VBB":2} > {"VAA":"d", "VBB":3} > {"VAA":null, "VBB":null} > Java: > SparkContext sc = new SparkContext(conf); > HiveContext sqlContext = new HiveContext(sc); > DataFrame df = sqlContext.read().json(getInputPath("input.json")); > > df = df.withColumn( > "previous", > lag(dataFrame.col("VBB"), 1) > .over(Window.orderBy(dataFrame.col("VAA"))) > ); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10893) Lag Analytic function broken
Jo Desmet created SPARK-10893: - Summary: Lag Analytic function broken Key: SPARK-10893 URL: https://issues.apache.org/jira/browse/SPARK-10893 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.5.0 Environment: Spark Standalone Cluster on Linux Reporter: Jo Desmet Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10893) Lag Analytic function broken
[ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jo Desmet updated SPARK-10893: -- Description: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: {code:title=Bar.java|borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} was: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); > Lag Analytic function broken > > > Key: SPARK-10893 > URL: https://issues.apache.org/jira/browse/SPARK-10893 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.5.0 > Environment: Spark Standalone Cluster on Linux >Reporter: Jo Desmet > > Trying to aggregate with the LAG Analytic function gives the wrong result. In > my testcase it was always giving the fixed value '103079215105' when I tried > to run on an integer. > Note that this only happens on Spark 1.5.0, and only when running in cluster > mode. > It works fine when running on Spark 1.4.1, or when running in local mode. > I did not test on a yarn cluster. > I did not test other analytic aggregates. > Input Jason: > {"VAA":"A", "VBB":1} > {"VAA":"B", "VBB":-1} > {"VAA":"C", "VBB":2} > {"VAA":"d", "VBB":3} > {"VAA":null, "VBB":null} > Java: > {code:title=Bar.java|borderStyle=solid} > SparkContext sc = new SparkContext(conf); > HiveContext sqlContext = new HiveContext(sc); > DataFrame df = sqlContext.read().json(getInputPath("input.json")); > > df = df.withColumn( > "previous", > lag(dataFrame.col("VBB"), 1) > .over(Window.orderBy(dataFrame.col("VAA"))) > ); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10893) Lag Analytic function broken
[ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jo Desmet updated SPARK-10893: -- Description: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: {code:borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} was: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: {code:title=Bar.java|borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} > Lag Analytic function broken > > > Key: SPARK-10893 > URL: https://issues.apache.org/jira/browse/SPARK-10893 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.5.0 > Environment: Spark Standalone Cluster on Linux >Reporter: Jo Desmet > > Trying to aggregate with the LAG Analytic function gives the wrong result. In > my testcase it was always giving the fixed value '103079215105' when I tried > to run on an integer. > Note that this only happens on Spark 1.5.0, and only when running in cluster > mode. > It works fine when running on Spark 1.4.1, or when running in local mode. > I did not test on a yarn cluster. > I did not test other analytic aggregates. > Input Jason: > {"VAA":"A", "VBB":1} > {"VAA":"B", "VBB":-1} > {"VAA":"C", "VBB":2} > {"VAA":"d", "VBB":3} > {"VAA":null, "VBB":null} > Java: > {code:borderStyle=solid} > SparkContext sc = new SparkContext(conf); > HiveContext sqlContext = new HiveContext(sc); > DataFrame df = sqlContext.read().json(getInputPath("input.json")); > > df = df.withColumn( > "previous", > lag(dataFrame.col("VBB"), 1) > .over(Window.orderBy(dataFrame.col("VAA"))) > ); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10893) Lag Analytic function broken
[ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jo Desmet updated SPARK-10893: -- Description: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {code:borderStyle=solid} {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} {code} Java: {code:borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} was: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: {code:borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} > Lag Analytic function broken > > > Key: SPARK-10893 > URL: https://issues.apache.org/jira/browse/SPARK-10893 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.5.0 > Environment: Spark Standalone Cluster on Linux >Reporter: Jo Desmet > > Trying to aggregate with the LAG Analytic function gives the wrong result. In > my testcase it was always giving the fixed value '103079215105' when I tried > to run on an integer. > Note that this only happens on Spark 1.5.0, and only when running in cluster > mode. > It works fine when running on Spark 1.4.1, or when running in local mode. > I did not test on a yarn cluster. > I did not test other analytic aggregates. > Input Jason: > {code:borderStyle=solid} > {"VAA":"A", "VBB":1} > {"VAA":"B", "VBB":-1} > {"VAA":"C", "VBB":2} > {"VAA":"d", "VBB":3} > {"VAA":null, "VBB":null} > {code} > Java: > {code:borderStyle=solid} > SparkContext sc = new SparkContext(conf); > HiveContext sqlContext = new HiveContext(sc); > DataFrame df = sqlContext.read().json(getInputPath("input.json")); > > df = df.withColumn( > "previous", > lag(dataFrame.col("VBB"), 1) > .over(Window.orderBy(dataFrame.col("VAA"))) > ); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10893) Lag Analytic function broken
[ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jo Desmet updated SPARK-10893: -- Description: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {code:borderStyle=solid} {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} {code} Java: {code:borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} Expected Result: {code:borderStyle=solid} {"VAA":null, "VBB":null, "previous":null} {"VAA":"A", "VBB":1, "previous":null} {"VAA":"B", "VBB":-1, "previous":1} {"VAA":"C", "VBB":2, "previous":-1} {"VAA":"d", "VBB":3, "previous":2} {code} Actual Result: {code:borderStyle=solid} {"VAA":null, "VBB":null, "previous":null} {"VAA":"A", "VBB":1, "previous":103079215105} {"VAA":"B", "VBB":-1, "previous":103079215105} {"VAA":"C", "VBB":2, "previous":103079215105} {"VAA":"d", "VBB":3, "previous":103079215105} {code} was: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {code:borderStyle=solid} {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} {code} Java: {code:borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} Expected Result: {code:borderStyle=solid} {"VAA":null, "VBB":null, "previous":null} {"VAA":"A", "VBB":1, "previous":null} {"VAA":"B", "VBB":-1, "previous":1} {"VAA":"C", "VBB":2, "previous":-1} {"VAA":"d", "VBB":3, "previous":2} {code} [null,null,null] [A,1,null] [B,-1,1] [C,2,-1] [d,3,2] > Lag Analytic function broken > > > Key: SPARK-10893 > URL: https://issues.apache.org/jira/browse/SPARK-10893 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.5.0 > Environment: Spark Standalone Cluster on Linux >Reporter: Jo Desmet > > Trying to aggregate with the LAG Analytic function gives the wrong result. In > my testcase it was always giving the fixed value '103079215105' when I tried > to run on an integer. > Note that this only happens on Spark 1.5.0, and only when running in cluster > mode. > It works fine when running on Spark 1.4.1, or when running in local mode. > I did not test on a yarn cluster. > I did not test other analytic aggregates. > Input Jason: > {code:borderStyle=solid} > {"VAA":"A", "VBB":1} > {"VAA":"B", "VBB":-1} > {"VAA":"C", "VBB":2} > {"VAA":"d", "VBB":3} > {"VAA":null, "VBB":null} > {code} > Java: > {code:borderStyle=solid} > SparkContext sc = new SparkContext(conf); > HiveContext sqlContext = new HiveContext(sc); > DataFrame df = sqlContext.read().json(getInputPath("input.json")); > > df = df.withColumn( > "previous", > lag(dataFrame.col("VBB"), 1) > .over(Window.orderBy(dataFrame.col("VAA"))) > ); > {code} > Expected Result: > {code:borderStyle=solid} > {"VAA":null, "VBB":null, "previous":null} > {"VAA":"A", "VBB":1, "previous":null} > {"VAA":"B", "VBB":-1, "previous":1} > {"VAA":"C", "VBB":2, "previous":-1} > {"VAA":"d", "VBB":3, "previous":2} > {code} > Actual Result: > {code:borderStyle=solid} > {"VAA":null, "VBB":null, "previous":null} > {"VAA":"A", "VBB":1, "previous":103079215105} > {"VAA":"B", "VBB":-1, "previous":103079215105} > {"VAA":"C", "VBB":2, "previous":103079215105} > {"VAA":"d", "VBB":3, "previous":103079215105} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail:
[jira] [Updated] (SPARK-10893) Lag Analytic function broken
[ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jo Desmet updated SPARK-10893: -- Description: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {code:borderStyle=solid} {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} {code} Java: {code:borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} Expected Result: {code:borderStyle=solid} {"VAA":null, "VBB":null, "previous":null} {"VAA":"A", "VBB":1, "previous":null} {"VAA":"B", "VBB":-1, "previous":1} {"VAA":"C", "VBB":2, "previous":-1} {"VAA":"d", "VBB":3, "previous":2} {code} Actual Result: {code:borderStyle=solid} {"VAA":null, "VBB":null, "previous":103079215105} {"VAA":"A", "VBB":1, "previous":103079215105} {"VAA":"B", "VBB":-1, "previous":103079215105} {"VAA":"C", "VBB":2, "previous":103079215105} {"VAA":"d", "VBB":3, "previous":103079215105} {code} was: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {code:borderStyle=solid} {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} {code} Java: {code:borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} Expected Result: {code:borderStyle=solid} {"VAA":null, "VBB":null, "previous":null} {"VAA":"A", "VBB":1, "previous":null} {"VAA":"B", "VBB":-1, "previous":1} {"VAA":"C", "VBB":2, "previous":-1} {"VAA":"d", "VBB":3, "previous":2} {code} Actual Result: {code:borderStyle=solid} {"VAA":null, "VBB":null, "previous":null} {"VAA":"A", "VBB":1, "previous":103079215105} {"VAA":"B", "VBB":-1, "previous":103079215105} {"VAA":"C", "VBB":2, "previous":103079215105} {"VAA":"d", "VBB":3, "previous":103079215105} {code} > Lag Analytic function broken > > > Key: SPARK-10893 > URL: https://issues.apache.org/jira/browse/SPARK-10893 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.5.0 > Environment: Spark Standalone Cluster on Linux >Reporter: Jo Desmet > > Trying to aggregate with the LAG Analytic function gives the wrong result. In > my testcase it was always giving the fixed value '103079215105' when I tried > to run on an integer. > Note that this only happens on Spark 1.5.0, and only when running in cluster > mode. > It works fine when running on Spark 1.4.1, or when running in local mode. > I did not test on a yarn cluster. > I did not test other analytic aggregates. > Input Jason: > {code:borderStyle=solid} > {"VAA":"A", "VBB":1} > {"VAA":"B", "VBB":-1} > {"VAA":"C", "VBB":2} > {"VAA":"d", "VBB":3} > {"VAA":null, "VBB":null} > {code} > Java: > {code:borderStyle=solid} > SparkContext sc = new SparkContext(conf); > HiveContext sqlContext = new HiveContext(sc); > DataFrame df = sqlContext.read().json(getInputPath("input.json")); > > df = df.withColumn( > "previous", > lag(dataFrame.col("VBB"), 1) > .over(Window.orderBy(dataFrame.col("VAA"))) > ); > {code} > Expected Result: > {code:borderStyle=solid} > {"VAA":null, "VBB":null, "previous":null} > {"VAA":"A", "VBB":1, "previous":null} > {"VAA":"B", "VBB":-1, "previous":1} > {"VAA":"C", "VBB":2, "previous":-1} > {"VAA":"d", "VBB":3, "previous":2} > {code} > Actual Result: > {code:borderStyle=solid} > {"VAA":null, "VBB":null, "previous":103079215105} > {"VAA":"A", "VBB":1, "previous":103079215105} > {"VAA":"B", "VBB":-1, "previous":103079215105} > {"VAA":"C", "VBB":2, "previous":103079215105} > {"VAA":"d", "VBB":3, "previous":103079215105}
[jira] [Resolved] (SPARK-10807) Add as.data.frame() as a synonym for collect()
[ https://issues.apache.org/jira/browse/SPARK-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-10807. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8908 [https://github.com/apache/spark/pull/8908] > Add as.data.frame() as a synonym for collect() > -- > > Key: SPARK-10807 > URL: https://issues.apache.org/jira/browse/SPARK-10807 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas >Priority: Minor > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10807) Add as.data.frame() as a synonym for collect()
[ https://issues.apache.org/jira/browse/SPARK-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-10807: -- Assignee: Oscar D. Lara Yejas > Add as.data.frame() as a synonym for collect() > -- > > Key: SPARK-10807 > URL: https://issues.apache.org/jira/browse/SPARK-10807 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas >Assignee: Oscar D. Lara Yejas >Priority: Minor > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10893) Lag Analytic function broken
[ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jo Desmet updated SPARK-10893: -- Description: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); was: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); > Lag Analytic function broken > > > Key: SPARK-10893 > URL: https://issues.apache.org/jira/browse/SPARK-10893 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.5.0 > Environment: Spark Standalone Cluster on Linux >Reporter: Jo Desmet > > Trying to aggregate with the LAG Analytic function gives the wrong result. In > my testcase it was always giving the fixed value '103079215105' when I tried > to run on an integer. > Note that this only happens on Spark 1.5.0, and only when running in cluster > mode. > It works fine when running on Spark 1.4.1, or when running in local mode. I > did not test on a yarn cluster. > Input Jason: > {"VAA":"A", "VBB":1} > {"VAA":"B", "VBB":-1} > {"VAA":"C", "VBB":2} > {"VAA":"d", "VBB":3} > {"VAA":null, "VBB":null} > Java: > SparkContext sc = new SparkContext(conf); > HiveContext sqlContext = new HiveContext(sc); > DataFrame df = sqlContext.read().json(getInputPath("input.json")); > > df = df.withColumn( > "previous", > lag(dataFrame.col("VBB"), 1) > .over(Window.orderBy(dataFrame.col("VAA"))) > ); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10893) Lag Analytic function broken
[ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jo Desmet updated SPARK-10893: -- Description: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {code:borderStyle=solid} {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} {code} Java: {code:borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} Expected Result: {code:borderStyle=solid} {"VAA":null, "VBB":null, "previous":null} {"VAA":"A", "VBB":1, "previous":null} {"VAA":"B", "VBB":-1, "previous":1} {"VAA":"C", "VBB":2, "previous":-1} {"VAA":"d", "VBB":3, "previous":2} {code} [null,null,null] [A,1,null] [B,-1,1] [C,2,-1] [d,3,2] was: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {code:borderStyle=solid} {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} {code} Java: {code:borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} > Lag Analytic function broken > > > Key: SPARK-10893 > URL: https://issues.apache.org/jira/browse/SPARK-10893 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.5.0 > Environment: Spark Standalone Cluster on Linux >Reporter: Jo Desmet > > Trying to aggregate with the LAG Analytic function gives the wrong result. In > my testcase it was always giving the fixed value '103079215105' when I tried > to run on an integer. > Note that this only happens on Spark 1.5.0, and only when running in cluster > mode. > It works fine when running on Spark 1.4.1, or when running in local mode. > I did not test on a yarn cluster. > I did not test other analytic aggregates. > Input Jason: > {code:borderStyle=solid} > {"VAA":"A", "VBB":1} > {"VAA":"B", "VBB":-1} > {"VAA":"C", "VBB":2} > {"VAA":"d", "VBB":3} > {"VAA":null, "VBB":null} > {code} > Java: > {code:borderStyle=solid} > SparkContext sc = new SparkContext(conf); > HiveContext sqlContext = new HiveContext(sc); > DataFrame df = sqlContext.read().json(getInputPath("input.json")); > > df = df.withColumn( > "previous", > lag(dataFrame.col("VBB"), 1) > .over(Window.orderBy(dataFrame.col("VAA"))) > ); > {code} > Expected Result: > {code:borderStyle=solid} > {"VAA":null, "VBB":null, "previous":null} > {"VAA":"A", "VBB":1, "previous":null} > {"VAA":"B", "VBB":-1, "previous":1} > {"VAA":"C", "VBB":2, "previous":-1} > {"VAA":"d", "VBB":3, "previous":2} > {code} > [null,null,null] > [A,1,null] > [B,-1,1] > [C,2,-1] > [d,3,2] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10894) Add 'drop' support for DataFrame's subset function
[ https://issues.apache.org/jira/browse/SPARK-10894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939339#comment-14939339 ] Weiqiang Zhuang commented on SPARK-10894: - I am working on this. > Add 'drop' support for DataFrame's subset function > -- > > Key: SPARK-10894 > URL: https://issues.apache.org/jira/browse/SPARK-10894 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Weiqiang Zhuang > > SparkR DataFrame can be subset to get one or more columns of the dataset. The > current '[' implementation does not support 'drop' when is asked for just one > column. This is not consistent with the R syntax: > x[i, j, ... , drop = TRUE] > # in R, when drop is FALSE, remain as data.frame > > class(iris[, "Sepal.Width", drop=F]) > [1] "data.frame" > # when drop is TRUE (default), drop to be a vector > > class(iris[, "Sepal.Width", drop=T]) > [1] "numeric" > > class(iris[,"Sepal.Width"]) > [1] "numeric" > > df <- createDataFrame(sqlContext, iris) > # in SparkR, 'drop' argument has no impact > > class(df[,"Sepal_Width", drop=F]) > [1] "DataFrame" > attr(,"package") > [1] "SparkR" > # should have dropped to be a Column class instead > > class(df[,"Sepal_Width", drop=T]) > [1] "DataFrame" > attr(,"package") > [1] "SparkR" > > class(df[,"Sepal_Width"]) > [1] "DataFrame" > attr(,"package") > [1] "SparkR" > We should add the 'drop' support. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10894) Add 'drop' support for DataFrame's subset function
Weiqiang Zhuang created SPARK-10894: --- Summary: Add 'drop' support for DataFrame's subset function Key: SPARK-10894 URL: https://issues.apache.org/jira/browse/SPARK-10894 Project: Spark Issue Type: Improvement Components: SparkR Reporter: Weiqiang Zhuang SparkR DataFrame can be subset to get one or more columns of the dataset. The current '[' implementation does not support 'drop' when is asked for just one column. This is not consistent with the R syntax: x[i, j, ... , drop = TRUE] # in R, when drop is FALSE, remain as data.frame > class(iris[, "Sepal.Width", drop=F]) [1] "data.frame" # when drop is TRUE (default), drop to be a vector > class(iris[, "Sepal.Width", drop=T]) [1] "numeric" > class(iris[,"Sepal.Width"]) [1] "numeric" > df <- createDataFrame(sqlContext, iris) # in SparkR, 'drop' argument has no impact > class(df[,"Sepal_Width", drop=F]) [1] "DataFrame" attr(,"package") [1] "SparkR" # should have dropped to be a Column class instead > class(df[,"Sepal_Width", drop=T]) [1] "DataFrame" attr(,"package") [1] "SparkR" > class(df[,"Sepal_Width"]) [1] "DataFrame" attr(,"package") [1] "SparkR" We should add the 'drop' support. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10891) Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka
[ https://issues.apache.org/jira/browse/SPARK-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10891: Assignee: Apache Spark > Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka > --- > > Key: SPARK-10891 > URL: https://issues.apache.org/jira/browse/SPARK-10891 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Burak Yavuz >Assignee: Apache Spark > > There is support for message handler in Direct Kafka Stream, which allows > arbitrary T to be the output of the stream instead of Array[Byte]. This is a > very useful function, therefore should exist in Kinesis as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10891) Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka
[ https://issues.apache.org/jira/browse/SPARK-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939359#comment-14939359 ] Apache Spark commented on SPARK-10891: -- User 'brkyvz' has created a pull request for this issue: https://github.com/apache/spark/pull/8954 > Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka > --- > > Key: SPARK-10891 > URL: https://issues.apache.org/jira/browse/SPARK-10891 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Burak Yavuz > > There is support for message handler in Direct Kafka Stream, which allows > arbitrary T to be the output of the stream instead of Array[Byte]. This is a > very useful function, therefore should exist in Kinesis as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10891) Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka
[ https://issues.apache.org/jira/browse/SPARK-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10891: Assignee: (was: Apache Spark) > Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka > --- > > Key: SPARK-10891 > URL: https://issues.apache.org/jira/browse/SPARK-10891 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Burak Yavuz > > There is support for message handler in Direct Kafka Stream, which allows > arbitrary T to be the output of the stream instead of Array[Byte]. This is a > very useful function, therefore should exist in Kinesis as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5569) Checkpoints cannot reference classes defined outside of Spark's assembly
[ https://issues.apache.org/jira/browse/SPARK-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939353#comment-14939353 ] Deming Zhu commented on SPARK-5569: --- I encount the exactly same problem with you, and I think it's an bug of ObjectInputStreamWithLoader. ObjectInputStreamWithLoader extends the ObjectInputStream class and override its resolveClass method. But Instead of Using Class.forName(desc,false,loader), Spark uses loader.loadClass(desc) to instance the class. which do not works when there's Array type class. For example: Class.forName("[Ljava.lang.String",false,loader) works well while loader.loadClass("[Ljava.lang.String") would throw an class not found exception. details of the difference can be found here. http://bugs.java.com/view_bug.do?bug_id=6446627 I would make a pull request to Spark. Before it's accepted, you can redefine the ObjectInputStreamWithLoader and replace it with the original one. > Checkpoints cannot reference classes defined outside of Spark's assembly > > > Key: SPARK-5569 > URL: https://issues.apache.org/jira/browse/SPARK-5569 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Patrick Wendell > > Not sure if this is a bug or a feature, but it's not obvious, so wanted to > create a JIRA to make sure we document this behavior. > First documented by Cody Koeninger: > https://gist.github.com/koeninger/561a61482cd1b5b3600c > {code} > 15/01/12 16:07:07 INFO CheckpointReader: Attempting to load checkpoint from > file file:/var/tmp/cp/checkpoint-142110041.bk > 15/01/12 16:07:07 WARN CheckpointReader: Error reading checkpoint from file > file:/var/tmp/cp/checkpoint-142110041.bk > java.io.IOException: java.lang.ClassNotFoundException: > org.apache.spark.rdd.kafka.KafkaRDDPartition > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1043) > at > org.apache.spark.streaming.dstream.DStreamCheckpointData.readObject(DStreamCheckpointData.scala:146) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at > java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) > at > org.apache.spark.streaming.DStreamGraph$$anonfun$readObject$1.apply$mcV$sp(DStreamGraph.scala:180) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1040) > at > org.apache.spark.streaming.DStreamGraph.readObject(DStreamGraph.scala:176) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at >
[jira] [Commented] (SPARK-9472) Consistent hadoop config for streaming
[ https://issues.apache.org/jira/browse/SPARK-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936866#comment-14936866 ] Cody Koeninger commented on SPARK-9472: --- You can get around this by passing in the hadoop context that you want as an argument to getOrCreate StreamingContext.getOrCreate(somePath, () => someFunction, SparkHadoopUtil .get.newConfiguration(someSparkConf) On Tue, Sep 29, 2015 at 10:00 PM, Russell Alexander Spitzer (JIRA) < > Consistent hadoop config for streaming > -- > > Key: SPARK-9472 > URL: https://issues.apache.org/jira/browse/SPARK-9472 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Cody Koeninger >Assignee: Cody Koeninger >Priority: Minor > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10883) Be able to build each module individually
Jean-Baptiste Onofré created SPARK-10883: Summary: Be able to build each module individually Key: SPARK-10883 URL: https://issues.apache.org/jira/browse/SPARK-10883 Project: Spark Issue Type: Improvement Components: Build Reporter: Jean-Baptiste Onofré Right now, due to the location of the scalastyle-config.xml location, it's not possible to build an individual module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10884) Support prediction on single instance for PredictionModel and ClassificationModel
[ https://issues.apache.org/jira/browse/SPARK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936951#comment-14936951 ] Apache Spark commented on SPARK-10884: -- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/8883 > Support prediction on single instance for PredictionModel and > ClassificationModel > - > > Key: SPARK-10884 > URL: https://issues.apache.org/jira/browse/SPARK-10884 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Yanbo Liang > > Support prediction on single instance for regression and classification > related models. > Add corresponding test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10883) Be able to build each module individually
[ https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936898#comment-14936898 ] Apache Spark commented on SPARK-10883: -- User 'jbonofre' has created a pull request for this issue: https://github.com/apache/spark/pull/8949 > Be able to build each module individually > - > > Key: SPARK-10883 > URL: https://issues.apache.org/jira/browse/SPARK-10883 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Jean-Baptiste Onofré > > Right now, due to the location of the scalastyle-config.xml location, it's > not possible to build an individual module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10883) Be able to build each module individually
[ https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10883: Assignee: Apache Spark > Be able to build each module individually > - > > Key: SPARK-10883 > URL: https://issues.apache.org/jira/browse/SPARK-10883 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Jean-Baptiste Onofré >Assignee: Apache Spark > > Right now, due to the location of the scalastyle-config.xml location, it's > not possible to build an individual module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10883) Be able to build each module individually
[ https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936944#comment-14936944 ] Marcelo Vanzin commented on SPARK-10883: How are you trying to build the individual module? Something like this works fine for me and invokes scalastyle successfully: {code} mvn install -pl :spark-launcher_2.10 {code} I believe you run into problems if you use {{-f path/to/pom.xml}}, which is mildly annoying considering there is an alternative. But as your PR shows, fixing that case is kinda noisy. > Be able to build each module individually > - > > Key: SPARK-10883 > URL: https://issues.apache.org/jira/browse/SPARK-10883 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Jean-Baptiste Onofré > > Right now, due to the location of the scalastyle-config.xml location, it's > not possible to build an individual module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10413) Model should support prediction on single instance
[ https://issues.apache.org/jira/browse/SPARK-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936947#comment-14936947 ] Yanbo Liang commented on SPARK-10413: - [~mengxr] [~josephkb] I found this issue involved too many models and files, so let's separate this JIRA into sub tasks. I have opened SPARK-10884 to make all classification and regression model support prediction on single instance. Other community members who are interested in this issue can open other sub tasks and work on them. > Model should support prediction on single instance > -- > > Key: SPARK-10413 > URL: https://issues.apache.org/jira/browse/SPARK-10413 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > Currently models in the pipeline API only implement transform(DataFrame). It > would be quite useful to support prediction on single instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10884) Support prediction on single instance for PredictionModel and ClassificationModel
[ https://issues.apache.org/jira/browse/SPARK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10884: Assignee: (was: Apache Spark) > Support prediction on single instance for PredictionModel and > ClassificationModel > - > > Key: SPARK-10884 > URL: https://issues.apache.org/jira/browse/SPARK-10884 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Yanbo Liang > > Support prediction on single instance for regression and classification > related models. > Add corresponding test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10884) Support prediction on single instance for PredictionModel and ClassificationModel
[ https://issues.apache.org/jira/browse/SPARK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10884: Assignee: Apache Spark > Support prediction on single instance for PredictionModel and > ClassificationModel > - > > Key: SPARK-10884 > URL: https://issues.apache.org/jira/browse/SPARK-10884 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Yanbo Liang >Assignee: Apache Spark > > Support prediction on single instance for regression and classification > related models. > Add corresponding test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10883) Be able to build each module individually
[ https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10883: Assignee: (was: Apache Spark) > Be able to build each module individually > - > > Key: SPARK-10883 > URL: https://issues.apache.org/jira/browse/SPARK-10883 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Jean-Baptiste Onofré > > Right now, due to the location of the scalastyle-config.xml location, it's > not possible to build an individual module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10883) Be able to build each module individually
[ https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936899#comment-14936899 ] Jean-Baptiste Onofré commented on SPARK-10883: -- https://github.com/apache/spark/pull/8949 > Be able to build each module individually > - > > Key: SPARK-10883 > URL: https://issues.apache.org/jira/browse/SPARK-10883 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Jean-Baptiste Onofré > > Right now, due to the location of the scalastyle-config.xml location, it's > not possible to build an individual module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-10883) Be able to build each module individually
[ https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Baptiste Onofré updated SPARK-10883: - Comment: was deleted (was: https://github.com/apache/spark/pull/8949) > Be able to build each module individually > - > > Key: SPARK-10883 > URL: https://issues.apache.org/jira/browse/SPARK-10883 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Jean-Baptiste Onofré > > Right now, due to the location of the scalastyle-config.xml location, it's > not possible to build an individual module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10884) Support prediction on single instance for PredictionModel and ClassificationModel
Yanbo Liang created SPARK-10884: --- Summary: Support prediction on single instance for PredictionModel and ClassificationModel Key: SPARK-10884 URL: https://issues.apache.org/jira/browse/SPARK-10884 Project: Spark Issue Type: Sub-task Components: ML Reporter: Yanbo Liang Support prediction on single instance for PredictionModel, ClassificationModel and their subclass. Add corresponding test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10884) Support prediction on single instance for PredictionModel and ClassificationModel
[ https://issues.apache.org/jira/browse/SPARK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-10884: Description: Support prediction on single instance for regression and classification related models. Add corresponding test cases. was:Support prediction on single instance for PredictionModel, ClassificationModel and their subclass. Add corresponding test cases. > Support prediction on single instance for PredictionModel and > ClassificationModel > - > > Key: SPARK-10884 > URL: https://issues.apache.org/jira/browse/SPARK-10884 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Yanbo Liang > > Support prediction on single instance for regression and classification > related models. > Add corresponding test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat
[ https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937473#comment-14937473 ] Apache Spark commented on SPARK-10058: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/8946 > Flaky test: HeartbeatReceiverSuite: normal heartbeat > > > Key: SPARK-10058 > URL: https://issues.apache.org/jira/browse/SPARK-10058 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Reporter: Davies Liu >Assignee: Shixiong Zhu >Priority: Critical > Labels: flaky-test > > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ > {code} > Error Message > 3 did not equal 2 > Stacktrace > sbt.ForkMain$ForkError: 3 did not equal 2 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at > org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at
[jira] [Assigned] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat
[ https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10058: Assignee: Apache Spark (was: Shixiong Zhu) > Flaky test: HeartbeatReceiverSuite: normal heartbeat > > > Key: SPARK-10058 > URL: https://issues.apache.org/jira/browse/SPARK-10058 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Reporter: Davies Liu >Assignee: Apache Spark >Priority: Critical > Labels: flaky-test > > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ > {code} > Error Message > 3 did not equal 2 > Stacktrace > sbt.ForkMain$ForkError: 3 did not equal 2 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at > org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at >
[jira] [Commented] (SPARK-9472) Consistent hadoop config for streaming
[ https://issues.apache.org/jira/browse/SPARK-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937471#comment-14937471 ] Russell Alexander Spitzer commented on SPARK-9472: -- Yeah that's the workaround we recommend now, but that requires every application to manually specify the files. We just want our distribution to not require that much manual intervention (normally we automatically pass in required hadoop conf based on the users security setup via spark.hadoop) > Consistent hadoop config for streaming > -- > > Key: SPARK-9472 > URL: https://issues.apache.org/jira/browse/SPARK-9472 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Cody Koeninger >Assignee: Cody Koeninger >Priority: Minor > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org