[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-09-30 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938249#comment-14938249
 ] 

Reynold Xin commented on SPARK-10474:
-

Ah ic. It is possible that the # core estimation is completely off in Mesos 
fine grained mode.

[~hbogert] can you print the page size and num cores in this function to check 
their value?
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174


> TungstenAggregation cannot acquire memory for pointer array after switching 
> to sort-based
> -
>
> Key: SPARK-10474
> URL: https://issues.apache.org/jira/browse/SPARK-10474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> In aggregation case, a  Lost task happened with below error.
> {code}
>  java.io.IOException: Could not acquire 65536 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
> at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:126)
> at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Key SQL Query
> {code:sql}
> INSERT INTO TABLE test_table
> SELECT
>   ss.ss_customer_sk AS cid,
>   count(CASE WHEN i.i_class_id=1  THEN 1 ELSE NULL END) AS id1,
>   count(CASE WHEN i.i_class_id=3  THEN 1 ELSE NULL END) AS id3,
>   count(CASE WHEN i.i_class_id=5  THEN 1 ELSE NULL END) AS id5,
>   count(CASE WHEN i.i_class_id=7  THEN 1 ELSE NULL END) AS id7,
>   count(CASE WHEN i.i_class_id=9  THEN 1 ELSE NULL END) AS id9,
>   count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11,
>   count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13,
>   count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15,
>   count(CASE WHEN i.i_class_id=2  THEN 1 ELSE NULL END) AS id2,
>   count(CASE WHEN i.i_class_id=4  THEN 1 ELSE NULL END) AS id4,
>   count(CASE WHEN i.i_class_id=6  THEN 1 ELSE NULL END) AS id6,
>   count(CASE WHEN i.i_class_id=8  THEN 1 ELSE NULL END) AS id8,
>   count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10,
>   count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14,
>   count(CASE WHEN i.i_class_id=16 THEN 1 ELSE NULL END) AS id16
> FROM store_sales ss
> INNER JOIN item i ON ss.ss_item_sk = i.i_item_sk
> WHERE i.i_category IN ('Books')
> AND ss.ss_customer_sk IS NOT NULL
> GROUP BY ss.ss_customer_sk
> 

[jira] [Commented] (SPARK-10855) Add a JDBC dialect for Apache Derby

2015-09-30 Thread Rick Hillegas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938334#comment-14938334
 ] 

Rick Hillegas commented on SPARK-10855:
---

I intend to create a pull request for this soon after running the tests and 
style checks.

> Add a JDBC dialect for Apache  Derby
> 
>
> Key: SPARK-10855
> URL: https://issues.apache.org/jira/browse/SPARK-10855
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Rick Hillegas
>Priority: Minor
>
> In particular, it would be good if the dialect could handle Derby's 
> user-defined types. The following script fails:
> {noformat}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> // the following script was used to create a Derby table
> // which has a column of user-defined type:
> // 
> // create type properties external name 'java.util.Properties' language java;
> // 
> // create function systemProperties() returns properties
> // language java parameter style java no sql
> // external name 'java.lang.System.getProperties';
> // 
> // create table propertiesTable( props properties );
> // 
> // insert into propertiesTable values ( null ), ( systemProperties() );
> // 
> // select * from propertiesTable;
> // cannot handle a table which has a column of type 
> java.sql.Types.JAVA_OBJECT:
> //
> // java.sql.SQLException: Unsupported type 2000
> //
> val df = sqlContext.read.format("jdbc").options( 
>   Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1",
>   "dbtable" -> "app.propertiesTable")).load()
> // shutdown the Derby engine
> val shutdown = sqlContext.read.format("jdbc").options( 
>   Map("url" -> "jdbc:derby:;shutdown=true",
>   "dbtable" -> "")).load()
> exit()
> {noformat}
> The inability to handle user-defined types probably affects other databases 
> besides Derby.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10770) SparkPlan.executeCollect/executeTake should return InternalRow rather than external Row

2015-09-30 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-10770.
-
   Resolution: Fixed
Fix Version/s: 1.6.0

> SparkPlan.executeCollect/executeTake should return InternalRow rather than 
> external Row
> ---
>
> Key: SPARK-10770
> URL: https://issues.apache.org/jira/browse/SPARK-10770
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10851) Exception not failing R applications (in yarn cluster mode)

2015-09-30 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-10851:
--
Assignee: Sun Rui

> Exception not failing R applications (in yarn cluster mode)
> ---
>
> Key: SPARK-10851
> URL: https://issues.apache.org/jira/browse/SPARK-10851
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, YARN
>Affects Versions: 1.5.0, 1.5.1
>Reporter: Zsolt Tóth
>Assignee: Sun Rui
> Fix For: 1.6.0
>
>
> The bug is the R version of SPARK-7736. The R script fails with an exception 
> but the Yarn application status is SUCCEEDED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10851) Exception not failing R applications (in yarn cluster mode)

2015-09-30 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-10851.
---
  Resolution: Fixed
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> Exception not failing R applications (in yarn cluster mode)
> ---
>
> Key: SPARK-10851
> URL: https://issues.apache.org/jira/browse/SPARK-10851
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, YARN
>Affects Versions: 1.5.0, 1.5.1
>Reporter: Zsolt Tóth
>Assignee: Sun Rui
> Fix For: 1.6.0
>
>
> The bug is the R version of SPARK-7736. The R script fails with an exception 
> but the Yarn application status is SUCCEEDED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior

2015-09-30 Thread Xin Ren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938337#comment-14938337
 ] 

Xin Ren commented on SPARK-9344:


first time try, is it ok I work on this one?

Thanks a lot. :P

> Spark SQL documentation does not clarify INSERT INTO behavior
> -
>
> Key: SPARK-9344
> URL: https://issues.apache.org/jira/browse/SPARK-9344
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 1.4.1
>Reporter: Simeon Simeonov
>Priority: Minor
>  Labels: documentation, sql
>
> The Spark SQL documentation does not address {{INSERT INTO}} behavior. The 
> section on Hive compatibility is misleading as it claims support for "the 
> vast majority of Hive features". The user mailing list has conflicting 
> information, including posts that claim {{INSERT INTO}} support targeting 1.0.
> In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10857) SQL injection bug in JdbcDialect.getTableExistsQuery()

2015-09-30 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937732#comment-14937732
 ] 

Suresh Thalamati commented on SPARK-10857:
--

One issue I ran into with getSchema() call was even if  Spark uses Java7  and 
above the JDBC driver versions  customers using may not have support for 
getSchema. 

I tried on couple of  databases I had , and got error one getSchema().   It is 
possible I have  old drivers. 
postgresql-9.3-1101-jdbc4.jar
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/bin/java -
Exception in thread "main" java.sql.SQLFeatureNotSupportedException: Method 
org.postgresql.jdbc4.Jdbc4Connection.getSchema() is not yet implemented.
at org.postgresql.Driver.notImplemented(Driver.java:729)
at 
org.postgresql.jdbc4.AbstractJdbc4Connection.getSchema(AbstractJdbc4Connection.java:239)

My SQL :
Implementation-Version: 5.1.17-SNAPSHOT
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/
Exception in thread "main" java.sql.SQLFeatureNotSupportedException: Not 
supported
at com.mysql.jdbc.JDBC4Connection.getSchema(JDBC4Connection.java:253)
...



> SQL injection bug in JdbcDialect.getTableExistsQuery()
> --
>
> Key: SPARK-10857
> URL: https://issues.apache.org/jira/browse/SPARK-10857
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Rick Hillegas
>Priority: Minor
>
> All of the implementations of this method involve constructing a query by 
> concatenating boilerplate text with a user-supplied name. This looks like a 
> SQL injection bug to me.
> A better solution would be to call java.sql.DatabaseMetaData.getTables() to 
> implement this method, using the catalog and schema which are available from 
> Connection.getCatalog() and Connection.getSchema(). This would not work on 
> Java 6 because Connection.getSchema() was introduced in Java 7. However, the 
> solution would work for more modern JVMs. Limiting the vulnerability to 
> obsolete JVMs would at least be an improvement over the current situation. 
> Java 6 has been end-of-lifed and is not an appropriate platform for users who 
> are concerned about security.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3862) MultiWayBroadcastInnerHashJoin

2015-09-30 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin closed SPARK-3862.
--
Resolution: Won't Fix

Closing this one for now since I think we can do something better with codegen 
without building specialized operators.

> MultiWayBroadcastInnerHashJoin
> --
>
> Key: SPARK-3862
> URL: https://issues.apache.org/jira/browse/SPARK-3862
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> It is common to have a single fact table inner join many small dimension 
> tables.  We can exploit this fact and create a MultiWayBroadcastInnerHashJoin 
> (or maybe just MultiwayDimensionJoin) operator that optimizes for this 
> pattern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3862) MultiWayBroadcastInnerHashJoin

2015-09-30 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938243#comment-14938243
 ] 

Reynold Xin commented on SPARK-3862:


David,

Thanks. Let's chat there. Since I created the ticket, I have new thoughts on 
how we can make something better with codegen, rather than writing specialized 
operators.


> MultiWayBroadcastInnerHashJoin
> --
>
> Key: SPARK-3862
> URL: https://issues.apache.org/jira/browse/SPARK-3862
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> It is common to have a single fact table inner join many small dimension 
> tables.  We can exploit this fact and create a MultiWayBroadcastInnerHashJoin 
> (or maybe just MultiwayDimensionJoin) operator that optimizes for this 
> pattern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10515) When killing executor, the pending replacement executors will be lost

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936475#comment-14936475
 ] 

Apache Spark commented on SPARK-10515:
--

User 'KaiXinXiaoLei' has created a pull request for this issue:
https://github.com/apache/spark/pull/8945

> When killing executor, the pending replacement executors will be lost
> -
>
> Key: SPARK-10515
> URL: https://issues.apache.org/jira/browse/SPARK-10515
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1
>Reporter: KaiXinXIaoLei
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10736) Use 1 for all ratings if $(ratingCol) = ""

2015-09-30 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-10736.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 8937
[https://github.com/apache/spark/pull/8937]

> Use 1 for all ratings if $(ratingCol) = ""
> --
>
> Key: SPARK-10736
> URL: https://issues.apache.org/jira/browse/SPARK-10736
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Priority: Minor
> Fix For: 1.6.0
>
>
> For some implicit dataset, ratings may not exist in the training data. In 
> this case, we can assume all observed pairs to be positive and treat their 
> ratings as 1. This should happen when users set ratingCol to an empty string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10800) Flaky test: org.apache.spark.deploy.StandaloneDynamicAllocationSuite

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936521#comment-14936521
 ] 

Apache Spark commented on SPARK-10800:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/8946

> Flaky test: org.apache.spark.deploy.StandaloneDynamicAllocationSuite
> 
>
> Key: SPARK-10800
> URL: https://issues.apache.org/jira/browse/SPARK-10800
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Assignee: Shixiong Zhu
>  Labels: flaky-test
>
> Saw several failures on master:
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/3622/HADOOP_PROFILE=hadoop-2.4,label=spark-test/testReport/junit/org.apache.spark.deploy/
> {code}
> org.apache.spark.deploy.StandaloneDynamicAllocationSuite.dynamic allocation 
> default behavior
> Failing for the past 1 build (Since Failed#3622 )
> Took 0.12 sec.
> add description
> Error Message
> 1 did not equal 2
> Stacktrace
>   org.scalatest.exceptions.TestFailedException: 1 did not equal 2
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply$mcV$sp(StandaloneDynamicAllocationSuite.scala:78)
>   at 
> org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:73)
>   at 
> org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:73)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.deploy.StandaloneDynamicAllocationSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(StandaloneDynamicAllocationSuite.scala:33)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.deploy.StandaloneDynamicAllocationSuite.runTest(StandaloneDynamicAllocationSuite.scala:33)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.deploy.StandaloneDynamicAllocationSuite.org$scalatest$BeforeAndAfterAll$$super$run(StandaloneDynamicAllocationSuite.scala:33)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at 
> 

[jira] [Updated] (SPARK-10869) Auto-normalization of semi-structured schema from a dataframe

2015-09-30 Thread Julien Genini (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Genini updated SPARK-10869:
--
Target Version/s:   (was: 1.5.1)

> Auto-normalization of semi-structured schema from a dataframe
> -
>
> Key: SPARK-10869
> URL: https://issues.apache.org/jira/browse/SPARK-10869
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 1.5.1
>Reporter: Julien Genini
>Priority: Minor
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> today, you can get a multi-depth schema from a semi-structured dataframe. 
> (XML, JSON, etc..)
> Not so easy to deal in data warehousing where it's better to normalize the 
> data.
> I propose an option to add when you get the schema (normalized, default False)
> Then the returned json schema will contains the normalized path for each 
> field, and the list of the different node levels
> df = sqlContext.read.json(jsonPath)
> jsonLinearSchema = df.schema.jsonValue(normalized=True)
> >>
> {code}
> {'fields': [{'fullPathName': 'SiteXML.BusinessDate',  
>   
>  'metadata': {},
>  'name': 'BusinessDate',
>  'nullable': True,
>  'type': 'string'},
> {'fullPathName': 'SiteXML.Site_List.Site.Id_Group',
>  'metadata': {},
>  'name': 'Id_Group',
>  'nullable': True,
>  'type': 'string'},
> {'fullPathName': 'SiteXML.Site_List.Site.Id_Site',
>  'metadata': {},
>  'name': 'Id_Site',
>  'nullable': True,
>  'type': 'string'},
> {'fullPathName': 'SiteXML.Site_List.Site.libelle',
>  'metadata': {},
>  'name': 'libelle',
>  'nullable': True,
>  'type': 'string'},
> {'fullPathName': 'SiteXML.Site_List.Site.libelle_Group',
>  'metadata': {},
>  'name': 'libelle_Group',
>  'nullable': True,
>  'type': 'string'},
> {'fullPathName': 'SiteXML.TimeStamp',
>  'metadata': {},
>  'name': 'TimeStamp',
>  'nullable': True,
>  'type': 'string'}],
>  'nodes': [{'fieldsFullPathName': ['SiteXML.BusinessDate',
>'SiteXML.TimeStamp'],
> 'fullPathName': 'SiteXML',
> 'nbFields': 2},
>{'fieldsFullPathName': ['SiteXML.Site_List.Site.Id_Group',
>'SiteXML.Site_List.Site.Id_Site',
>'SiteXML.Site_List.Site.libelle',
>'SiteXML.Site_List.Site.libelle_Group'],
> 'fullPathName': 'SiteXML.Site_List.Site',
> 'nbFields': 4}]}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6028) Provide an alternative RPC implementation based on the network transport module

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936411#comment-14936411
 ] 

Apache Spark commented on SPARK-6028:
-

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/8944

> Provide an alternative RPC implementation based on the network transport 
> module
> ---
>
> Key: SPARK-6028
> URL: https://issues.apache.org/jira/browse/SPARK-6028
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Shixiong Zhu
>Priority: Critical
> Fix For: 1.6.0
>
>
> Network transport module implements a low level RPC interface. We can build a 
> new RPC implementation on top of that to replace Akka's.
> Design document: 
> https://docs.google.com/document/d/1CF5G6rGVQMKSyV_QKo4D2M-x6rxz5x1Ew7aK3Uq6u8c/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10687) Discuss nonparametric survival analysis model

2015-09-30 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936544#comment-14936544
 ] 

Yanbo Liang commented on SPARK-10687:
-

[~mengxr] Due to the Cox’s Proportional Hazard model is not easy to be 
implemented efficiently in Spark, I think the most common used non-parametric 
survival analysis model is Kaplan-Meier model. But it only give us an “average” 
view of the population rather than regression over covariates. I have update 
the design documents of SPARK-8518, please feel free to comment on the section 
of "Planning for the future". 

> Discuss nonparametric survival analysis model
> -
>
> Key: SPARK-10687
> URL: https://issues.apache.org/jira/browse/SPARK-10687
> Project: Spark
>  Issue Type: Brainstorming
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> Created this JIRA to discuss nonparametric survival models and feasibility to 
> implement them on Spark. Please also check the design doc posted on 
> SPARK-8518.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10736) Use 1 for all ratings if $(ratingCol) = ""

2015-09-30 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10736:
--
Assignee: Yanbo Liang

> Use 1 for all ratings if $(ratingCol) = ""
> --
>
> Key: SPARK-10736
> URL: https://issues.apache.org/jira/browse/SPARK-10736
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Assignee: Yanbo Liang
>Priority: Minor
> Fix For: 1.6.0
>
>
> For some implicit dataset, ratings may not exist in the training data. In 
> this case, we can assume all observed pairs to be positive and treat their 
> ratings as 1. This should happen when users set ratingCol to an empty string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10811) Minimize array copying cost in Parquet converters

2015-09-30 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-10811.

   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 8907
[https://github.com/apache/spark/pull/8907]

> Minimize array copying cost in Parquet converters
> -
>
> Key: SPARK-10811
> URL: https://issues.apache.org/jira/browse/SPARK-10811
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
> Fix For: 1.6.0
>
>
> When converting Parquet {{Binary}} values to {{UTF8String}} and {{Decimal}} 
> values, there exists unnecessary array copying costs ({{Binary.getBytes()}}), 
> which can be eliminated for better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10268) Add @Since annotation to ml.tree

2015-09-30 Thread Hiroshi Takahashi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936485#comment-14936485
 ] 

Hiroshi Takahashi commented on SPARK-10268:
---

[~mengxr] Could you take a look?

> Add @Since annotation to ml.tree
> 
>
> Key: SPARK-10268
> URL: https://issues.apache.org/jira/browse/SPARK-10268
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML
>Reporter: Xiangrui Meng
>Priority: Minor
>  Labels: starter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10886:


Assignee: (was: Apache Spark)

> Random RDD creation example fix in MLlib statistics programming guide - 
> mllib-statistics.md
> ---
>
> Key: SPARK-10886
> URL: https://issues.apache.org/jira/browse/SPARK-10886
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Jayant Shekhar
>Priority: Minor
>
> Creating Random RDDs had the follow line in the example for Random Data 
> Generation in the MLlib statistics programming guide:
> val u = normalRDD(sc, 100L, 10)
> It should be :
> val u = RandomRDDs.normalRDD(sc, 100L, 10)
> It applies to both the Scala and Java examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior

2015-09-30 Thread Xin Ren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938811#comment-14938811
 ] 

Xin Ren commented on SPARK-9344:


Hi I'll try to fix the docs, to get a sense how the commit process works, is it 
OK?

Or could you please recommend some tickets that are easy so I can get started? 
Thanks a lot

> Spark SQL documentation does not clarify INSERT INTO behavior
> -
>
> Key: SPARK-9344
> URL: https://issues.apache.org/jira/browse/SPARK-9344
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 1.4.1
>Reporter: Simeon Simeonov
>Priority: Minor
>  Labels: documentation, sql
>
> The Spark SQL documentation does not address {{INSERT INTO}} behavior. The 
> section on Hive compatibility is misleading as it claims support for "the 
> vast majority of Hive features". The user mailing list has conflicting 
> information, including posts that claim {{INSERT INTO}} support targeting 1.0.
> In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10887) Build HashRelation outside of HashJoinNode

2015-09-30 Thread Yin Huai (JIRA)
Yin Huai created SPARK-10887:


 Summary: Build HashRelation outside of HashJoinNode
 Key: SPARK-10887
 URL: https://issues.apache.org/jira/browse/SPARK-10887
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai


Right now, HashJoinNode builds a HashRelation for the build side. We can take 
this process out. So, we can use HashJoinNode for both Broadcast join and 
shuffled join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10888) Add as.DataFrame as a synonym for createDataFrame

2015-09-30 Thread Narine Kokhlikyan (JIRA)
Narine Kokhlikyan created SPARK-10888:
-

 Summary: Add as.DataFrame as a synonym for createDataFrame
 Key: SPARK-10888
 URL: https://issues.apache.org/jira/browse/SPARK-10888
 Project: Spark
  Issue Type: Sub-task
Reporter: Narine Kokhlikyan
Priority: Minor


as.DataFrame is more a R-style like signature. 
Also, I'd like to know if we could make the context, e.g. sqlContext global, so 
that we do not have to specify it as an argument, when we each time create a 
dataframe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10669) Link to each language's API in codetabs in ML docs: spark.mllib

2015-09-30 Thread Xin Ren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938897#comment-14938897
 ] 

Xin Ren commented on SPARK-10669:
-

may I have a try on this one?

Thanks a lot

> Link to each language's API in codetabs in ML docs: spark.mllib
> ---
>
> Key: SPARK-10669
> URL: https://issues.apache.org/jira/browse/SPARK-10669
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, MLlib
>Reporter: Joseph K. Bradley
>
> In the Markdown docs for the spark.mllib Programming Guide, we have code 
> examples with codetabs for each language.  We should link to each language's 
> API docs within the corresponding codetab, but we are inconsistent about 
> this.  For an example of what we want to do, see the "ChiSqSelector" section 
> in 
> [https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/mllib-feature-extraction.md]
> This JIRA is just for spark.mllib, not spark.ml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10886:


Assignee: Apache Spark

> Random RDD creation example fix in MLlib statistics programming guide - 
> mllib-statistics.md
> ---
>
> Key: SPARK-10886
> URL: https://issues.apache.org/jira/browse/SPARK-10886
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Jayant Shekhar
>Assignee: Apache Spark
>Priority: Minor
>
> Creating Random RDDs had the follow line in the example for Random Data 
> Generation in the MLlib statistics programming guide:
> val u = normalRDD(sc, 100L, 10)
> It should be :
> val u = RandomRDDs.normalRDD(sc, 100L, 10)
> It applies to both the Scala and Java examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938797#comment-14938797
 ] 

Apache Spark commented on SPARK-10886:
--

User 'jayantshekhar' has created a pull request for this issue:
https://github.com/apache/spark/pull/8951

> Random RDD creation example fix in MLlib statistics programming guide - 
> mllib-statistics.md
> ---
>
> Key: SPARK-10886
> URL: https://issues.apache.org/jira/browse/SPARK-10886
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Jayant Shekhar
>Priority: Minor
>
> Creating Random RDDs had the follow line in the example for Random Data 
> Generation in the MLlib statistics programming guide:
> val u = normalRDD(sc, 100L, 10)
> It should be :
> val u = RandomRDDs.normalRDD(sc, 100L, 10)
> It applies to both the Scala and Java examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10665) Connect the local iterators with the planner

2015-09-30 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-10665:


Assignee: Yin Huai

> Connect the local iterators with the planner
> 
>
> Key: SPARK-10665
> URL: https://issues.apache.org/jira/browse/SPARK-10665
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Yin Huai
>
> After creating these local iterators, we'd need to actually use them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior

2015-09-30 Thread Simeon Simeonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938814#comment-14938814
 ] 

Simeon Simeonov commented on SPARK-9344:


/cc [~joshrosen] [~andrewor14]

> Spark SQL documentation does not clarify INSERT INTO behavior
> -
>
> Key: SPARK-9344
> URL: https://issues.apache.org/jira/browse/SPARK-9344
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 1.4.1
>Reporter: Simeon Simeonov
>Priority: Minor
>  Labels: documentation, sql
>
> The Spark SQL documentation does not address {{INSERT INTO}} behavior. The 
> section on Hive compatibility is misleading as it claims support for "the 
> vast majority of Hive features". The user mailing list has conflicting 
> information, including posts that claim {{INSERT INTO}} support targeting 1.0.
> In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior

2015-09-30 Thread Simeon Simeonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938825#comment-14938825
 ] 

Simeon Simeonov commented on SPARK-9344:


[~joshrosen] when I logged the bug I was using {{HiveContext}}. 

Given how many Spark SQL bugs are logged here, e.g., issues with view support, 
it does make sense for the SQL docs to become more reality-based. :)

> Spark SQL documentation does not clarify INSERT INTO behavior
> -
>
> Key: SPARK-9344
> URL: https://issues.apache.org/jira/browse/SPARK-9344
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 1.4.1
>Reporter: Simeon Simeonov
>Priority: Minor
>  Labels: documentation, sql
>
> The Spark SQL documentation does not address {{INSERT INTO}} behavior. The 
> section on Hive compatibility is misleading as it claims support for "the 
> vast majority of Hive features". The user mailing list has conflicting 
> information, including posts that claim {{INSERT INTO}} support targeting 1.0.
> In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md

2015-09-30 Thread Jayant Shekhar (JIRA)
Jayant Shekhar created SPARK-10886:
--

 Summary: Random RDD creation example fix in MLlib statistics 
programming guide - mllib-statistics.md
 Key: SPARK-10886
 URL: https://issues.apache.org/jira/browse/SPARK-10886
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.5.0
Reporter: Jayant Shekhar
Priority: Minor


Creating Random RDDs had the follow line in the example for Random Data 
Generation in the MLlib statistics programming guide:

val u = normalRDD(sc, 100L, 10)

It should be :

val u = RandomRDDs.normalRDD(sc, 100L, 10)

It applies to both the Scala and Java examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior

2015-09-30 Thread Simeon Simeonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938806#comment-14938806
 ] 

Simeon Simeonov commented on SPARK-9344:


Are you suggesting to fix the docs or the code?

> Spark SQL documentation does not clarify INSERT INTO behavior
> -
>
> Key: SPARK-9344
> URL: https://issues.apache.org/jira/browse/SPARK-9344
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 1.4.1
>Reporter: Simeon Simeonov
>Priority: Minor
>  Labels: documentation, sql
>
> The Spark SQL documentation does not address {{INSERT INTO}} behavior. The 
> section on Hive compatibility is misleading as it claims support for "the 
> vast majority of Hive features". The user mailing list has conflicting 
> information, including posts that claim {{INSERT INTO}} support targeting 1.0.
> In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior

2015-09-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938839#comment-14938839
 ] 

Josh Rosen commented on SPARK-9344:
---

[~simeons], I just grepped through Spark's code base looking for {{INSERT 
INTO}} and found some test cases which use it: 
https://github.com/apache/spark/blob/418e5e4cbdaab87addb91ac0bb2245ff0213ac81/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala#L1015

Can you provide an example of a case which produces the parse error which would 
let us reproduce / triage the problem?

> Spark SQL documentation does not clarify INSERT INTO behavior
> -
>
> Key: SPARK-9344
> URL: https://issues.apache.org/jira/browse/SPARK-9344
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 1.4.1
>Reporter: Simeon Simeonov
>Priority: Minor
>  Labels: documentation, sql
>
> The Spark SQL documentation does not address {{INSERT INTO}} behavior. The 
> section on Hive compatibility is misleading as it claims support for "the 
> vast majority of Hive features". The user mailing list has conflicting 
> information, including posts that claim {{INSERT INTO}} support targeting 1.0.
> In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior

2015-09-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938819#comment-14938819
 ] 

Josh Rosen commented on SPARK-9344:
---

[~simeons], are you using HiveContext or SQLContext? If Spark _does_ support 
{{INSERT INTO}} then I suspect that it's only via the HiveContext interface, 
not plain SQLContext.

> Spark SQL documentation does not clarify INSERT INTO behavior
> -
>
> Key: SPARK-9344
> URL: https://issues.apache.org/jira/browse/SPARK-9344
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 1.4.1
>Reporter: Simeon Simeonov
>Priority: Minor
>  Labels: documentation, sql
>
> The Spark SQL documentation does not address {{INSERT INTO}} behavior. The 
> section on Hive compatibility is misleading as it claims support for "the 
> vast majority of Hive features". The user mailing list has conflicting 
> information, including posts that claim {{INSERT INTO}} support targeting 1.0.
> In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md

2015-09-30 Thread Jayant Shekhar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938782#comment-14938782
 ] 

Jayant Shekhar commented on SPARK-10886:


Am in the process of PR for it.

> Random RDD creation example fix in MLlib statistics programming guide - 
> mllib-statistics.md
> ---
>
> Key: SPARK-10886
> URL: https://issues.apache.org/jira/browse/SPARK-10886
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Jayant Shekhar
>Priority: Minor
>
> Creating Random RDDs had the follow line in the example for Random Data 
> Generation in the MLlib statistics programming guide:
> val u = normalRDD(sc, 100L, 10)
> It should be :
> val u = RandomRDDs.normalRDD(sc, 100L, 10)
> It applies to both the Scala and Java examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10886) Random RDD creation example fix in MLlib statistics programming guide - mllib-statistics.md

2015-09-30 Thread Jayant Shekhar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938782#comment-14938782
 ] 

Jayant Shekhar edited comment on SPARK-10886 at 9/30/15 8:27 PM:
-

Am in the process of creating a PR for it.


was (Author: jayants):
Am in the process of PR for it.

> Random RDD creation example fix in MLlib statistics programming guide - 
> mllib-statistics.md
> ---
>
> Key: SPARK-10886
> URL: https://issues.apache.org/jira/browse/SPARK-10886
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Jayant Shekhar
>Priority: Minor
>
> Creating Random RDDs had the follow line in the example for Random Data 
> Generation in the MLlib statistics programming guide:
> val u = normalRDD(sc, 100L, 10)
> It should be :
> val u = RandomRDDs.normalRDD(sc, 100L, 10)
> It applies to both the Scala and Java examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior

2015-09-30 Thread Simeon Simeonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938918#comment-14938918
 ] 

Simeon Simeonov commented on SPARK-9344:


[~joshrosen] Here is the reproducible test case you can try in {{spark-shell}}:

{code}
import org.apache.spark.sql.hive.HiveContext

val ctx = sqlContext.asInstanceOf[HiveContext]
import ctx.implicits._

(1 to 5).map(Tuple1.apply).toDF("w_int").write.save("test_data1")
(6 to 9).map(Tuple1.apply).toDF("w_int").write.save("test_data2")

ctx.sql("insert into table test_data1 select * from test_data2")
{code}

This fails with:

{code}
scala> ctx.sql("insert into table test_data1 select * from test_data2")
15/09/30 17:32:34 INFO ParseDriver: Parsing command: insert into table 
test_data1 select * from test_data2
15/09/30 17:32:34 INFO ParseDriver: Parse Completed
15/09/30 17:32:34 INFO HiveMetaStore: 0: get_table : db=default tbl=test_data1
15/09/30 17:32:34 INFO audit: ugi=sim   ip=unknown-ip-addr  cmd=get_table : 
db=default tbl=test_data1
org.apache.spark.sql.AnalysisException: no such table test_data1; line 1 pos 18
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:225)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:231)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:229)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:221)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:212)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:229)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:219)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:61)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:59)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:51)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:933)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:933)
at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:131)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:46)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:48)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:50)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:52)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:54)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:56)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:58)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:60)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:62)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:64)
at $iwC$$iwC$$iwC$$iwC.(:66)
at $iwC$$iwC$$iwC.(:68)
at $iwC$$iwC.(:70)
at $iwC.(:72)
at (:74)
at .(:78)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 

[jira] [Commented] (SPARK-10263) Add @Since annotation to ml.param and ml.*

2015-09-30 Thread Hiroshi Takahashi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936630#comment-14936630
 ] 

Hiroshi Takahashi commented on SPARK-10263:
---

[~mengxr] Could you take a look?

> Add @Since annotation to ml.param and ml.*
> --
>
> Key: SPARK-10263
> URL: https://issues.apache.org/jira/browse/SPARK-10263
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML
>Reporter: Xiangrui Meng
>Priority: Minor
>  Labels: starter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7869) Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7869:
---

Assignee: (was: Apache Spark)

> Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns
> --
>
> Key: SPARK-7869
> URL: https://issues.apache.org/jira/browse/SPARK-7869
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.3.0, 1.3.1
> Environment: Spark 1.3.1
>Reporter: Brad Willard
>Priority: Minor
>
> Most of our tables load into dataframes just fine with postgres. However we 
> have a number of tables leveraging the JSONB datatype. Spark will error and 
> refuse to load this table. While asking for Spark to support JSONB might be a 
> tall order in the short term, it would be great if Spark would at least load 
> the table ignoring the columns it can't load or have it be an option.
> pdf = sql_context.load(source="jdbc", url=url, dbtable="table_of_json")
> Py4JJavaError: An error occurred while calling o41.load.
> : java.sql.SQLException: Unsupported type 
> at org.apache.spark.sql.jdbc.JDBCRDD$.getCatalystType(JDBCRDD.scala:78)
> at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:112)
> at org.apache.spark.sql.jdbc.JDBCRelation.(JDBCRelation.scala:133)
> at 
> org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:121)
> at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
> at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
> at org.apache.spark.sql.SQLContext.load(SQLContext.scala:685)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> at py4j.Gateway.invoke(Gateway.java:259)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:207)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7869) Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7869:
---

Assignee: Apache Spark

> Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns
> --
>
> Key: SPARK-7869
> URL: https://issues.apache.org/jira/browse/SPARK-7869
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.3.0, 1.3.1
> Environment: Spark 1.3.1
>Reporter: Brad Willard
>Assignee: Apache Spark
>Priority: Minor
>
> Most of our tables load into dataframes just fine with postgres. However we 
> have a number of tables leveraging the JSONB datatype. Spark will error and 
> refuse to load this table. While asking for Spark to support JSONB might be a 
> tall order in the short term, it would be great if Spark would at least load 
> the table ignoring the columns it can't load or have it be an option.
> pdf = sql_context.load(source="jdbc", url=url, dbtable="table_of_json")
> Py4JJavaError: An error occurred while calling o41.load.
> : java.sql.SQLException: Unsupported type 
> at org.apache.spark.sql.jdbc.JDBCRDD$.getCatalystType(JDBCRDD.scala:78)
> at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:112)
> at org.apache.spark.sql.jdbc.JDBCRelation.(JDBCRelation.scala:133)
> at 
> org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:121)
> at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
> at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
> at org.apache.spark.sql.SQLContext.load(SQLContext.scala:685)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> at py4j.Gateway.invoke(Gateway.java:259)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:207)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7869) Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936695#comment-14936695
 ] 

Apache Spark commented on SPARK-7869:
-

User '0x0FFF' has created a pull request for this issue:
https://github.com/apache/spark/pull/8948

> Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns
> --
>
> Key: SPARK-7869
> URL: https://issues.apache.org/jira/browse/SPARK-7869
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.3.0, 1.3.1
> Environment: Spark 1.3.1
>Reporter: Brad Willard
>Priority: Minor
>
> Most of our tables load into dataframes just fine with postgres. However we 
> have a number of tables leveraging the JSONB datatype. Spark will error and 
> refuse to load this table. While asking for Spark to support JSONB might be a 
> tall order in the short term, it would be great if Spark would at least load 
> the table ignoring the columns it can't load or have it be an option.
> pdf = sql_context.load(source="jdbc", url=url, dbtable="table_of_json")
> Py4JJavaError: An error occurred while calling o41.load.
> : java.sql.SQLException: Unsupported type 
> at org.apache.spark.sql.jdbc.JDBCRDD$.getCatalystType(JDBCRDD.scala:78)
> at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:112)
> at org.apache.spark.sql.jdbc.JDBCRelation.(JDBCRelation.scala:133)
> at 
> org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:121)
> at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
> at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
> at org.apache.spark.sql.SQLContext.load(SQLContext.scala:685)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> at py4j.Gateway.invoke(Gateway.java:259)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:207)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10880) Hive module build test failed

2015-09-30 Thread JIRA
Jean-Baptiste Onofré created SPARK-10880:


 Summary: Hive module build test failed
 Key: SPARK-10880
 URL: https://issues.apache.org/jira/browse/SPARK-10880
 Project: Spark
  Issue Type: Bug
  Components: Tests
Reporter: Jean-Baptiste Onofré


On master, sql/hive module tests fail.

The reason is that the bin/spark-submit is not found. The impacted test are:

- SPARK-8468
- SPARK-8020
- SPARK-8489
- SPARK-9757

I gonna take a look to fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9776) Another instance of Derby may have already booted the database

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936634#comment-14936634
 ] 

Apache Spark commented on SPARK-9776:
-

User 'KaiXinXiaoLei' has created a pull request for this issue:
https://github.com/apache/spark/pull/8947

> Another instance of Derby may have already booted the database 
> ---
>
> Key: SPARK-9776
> URL: https://issues.apache.org/jira/browse/SPARK-9776
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: Mac Yosemite, spark-1.5.0
>Reporter: Sudhakar Thota
> Attachments: SPARK-9776-FL1.rtf
>
>
> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) results in 
> error. Though the same works for spark-1.4.1.
> Caused by: ERROR XSDB6: Another instance of Derby may have already booted the 
> database 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10881) Unable to use custom log4j appender in spark executor

2015-09-30 Thread David Moravek (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Moravek updated SPARK-10881:
--
Description: 
In CoarseGrainedExecutorBackend, log4j is initialized, before userclasspath 
gets registered:

https://github.com/apache/spark/blob/v1.3.1/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L126

In order to use custom appender, one has to distribute it using `spark-submit 
--files` and set it via spark.executor.extraClassPath



  was:
In CoarseGrainedExecutorBackend, log4j is initialized, before userclasspath 
gets registered:

https://github.com/apache/spark/blob/v1.3.1/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L126

In order to use custom appender, one have to distribute it using `spark-submit 
--files` and set it via spark.executor.extraClassPath




> Unable to use custom log4j appender in spark executor
> -
>
> Key: SPARK-10881
> URL: https://issues.apache.org/jira/browse/SPARK-10881
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1
>Reporter: David Moravek
>Priority: Minor
>
> In CoarseGrainedExecutorBackend, log4j is initialized, before userclasspath 
> gets registered:
> https://github.com/apache/spark/blob/v1.3.1/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L126
> In order to use custom appender, one has to distribute it using `spark-submit 
> --files` and set it via spark.executor.extraClassPath



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10881) Unable to use custom log4j appender in spark executor

2015-09-30 Thread David Moravek (JIRA)
David Moravek created SPARK-10881:
-

 Summary: Unable to use custom log4j appender in spark executor
 Key: SPARK-10881
 URL: https://issues.apache.org/jira/browse/SPARK-10881
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: David Moravek
Priority: Minor


In CoarseGrainedExecutorBackend, log4j is initialized, before userclasspath 
gets registered:

https://github.com/apache/spark/blob/v1.3.1/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L126

In order to use custom appender, one have to distribute it using `spark-submit 
--files` and set it via spark.executor.extraClassPath





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10882) Add the ability to connect to secured mqtt brokers

2015-09-30 Thread Alexandre Touret (JIRA)
Alexandre Touret created SPARK-10882:


 Summary: Add the ability to connect to secured mqtt brokers
 Key: SPARK-10882
 URL: https://issues.apache.org/jira/browse/SPARK-10882
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output
Affects Versions: 1.5.0, 1.5.1
Reporter: Alexandre Touret


Currently, there's no way to connect to secured MQTT brokers.
For example, I'm trying to subscribe to a MQTT topic hosted on a single 
RABBITMQ instance. I can't provide the credentials during the connection.

Furthermore, I saw in the source code  
(https://github.com/apache/spark/blob/7478c8b66d6a2b1179f20c38b49e27e37b0caec3/external/mqtt/src/main/scala/org/apache/spark/streaming/mqtt/MQTTInputDStream.scala#L50)
 that credentials are never initialized.

It could be nice to add this ability to spark

Regards



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10865) [Spark SQL] [UDF] the ceil/ceiling function got wrong return value type

2015-09-30 Thread Alexey Grishchenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936799#comment-14936799
 ] 

Alexey Grishchenko commented on SPARK-10865:


I don't really think that _floor()_ and _ceiling()_ should return integer data 
type. For instance, in 
[Oracle|http://docs.oracle.com/cd/E11882_01/server.112/e10592/functions067.htm#SQLRF00643],
 [Postgres|http://www.postgresql.org/docs/9.5/static/functions-math.html] and 
[MS SQL|https://msdn.microsoft.com/en-us/library/ms178531.aspx] these functions 
return the same data type provided as input

> [Spark SQL] [UDF] the ceil/ceiling function got wrong return value type
> ---
>
> Key: SPARK-10865
> URL: https://issues.apache.org/jira/browse/SPARK-10865
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>
> As per ceil/ceiling definition,it should get BIGINT return value
> -ceil(DOUBLE a), ceiling(DOUBLE a)
> -Returns the minimum BIGINT value that is equal to or greater than a.
> But in current Spark implementation, it got wrong value type.
> e.g., 
> select ceil(2642.12) from udf_test_web_sales limit 1;
> 2643.0
> In hive implementation, it got return value type like below:
> hive> select ceil(2642.12) from udf_test_web_sales limit 1;
> OK
> 2643



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10879) spark on yarn support priority option

2015-09-30 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936813#comment-14936813
 ] 

Marcelo Vanzin commented on SPARK-10879:


Is this related to YARN-1963? It's nice to reference the source of the 
functionality that is being implemented.

> spark on yarn support priority option
> -
>
> Key: SPARK-10879
> URL: https://issues.apache.org/jira/browse/SPARK-10879
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit, YARN
>Reporter: Yun Zhao
>
> Add a YARN-only option to spark-submit: *--priority PRIORITY* .The priority 
> of your YARN application (Default: 0).
> Add a property: *spark.yarn.priority*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10887) Build HashedRelation outside of HashJoinNode

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10887:


Assignee: Apache Spark  (was: Yin Huai)

> Build HashedRelation outside of HashJoinNode
> 
>
> Key: SPARK-10887
> URL: https://issues.apache.org/jira/browse/SPARK-10887
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Apache Spark
>
> Right now, HashJoinNode builds a HashRelation for the build side. We can take 
> this process out. So, we can use HashJoinNode for both Broadcast join and 
> shuffled join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10887) Build HashedRelation outside of HashJoinNode

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939047#comment-14939047
 ] 

Apache Spark commented on SPARK-10887:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/8953

> Build HashedRelation outside of HashJoinNode
> 
>
> Key: SPARK-10887
> URL: https://issues.apache.org/jira/browse/SPARK-10887
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> Right now, HashJoinNode builds a HashRelation for the build side. We can take 
> this process out. So, we can use HashJoinNode for both Broadcast join and 
> shuffled join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10891) Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka

2015-09-30 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-10891:
---

 Summary: Add MessageHandler to KinesisUtils.createStream similar 
to Direct Kafka
 Key: SPARK-10891
 URL: https://issues.apache.org/jira/browse/SPARK-10891
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: Burak Yavuz


There is support for message handler in Direct Kafka Stream, which allows 
arbitrary T to be the output of the stream instead of Array[Byte]. This is a 
very useful function, therefore should exist in Kinesis as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10560) Make StreamingLogisticRegressionWithSGD Python API equals with Scala one

2015-09-30 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939067#comment-14939067
 ] 

Bryan Cutler commented on SPARK-10560:
--

Hi [~yanboliang], I'd be happy to do this unless you have already started?

> Make StreamingLogisticRegressionWithSGD Python API equals with Scala one
> 
>
> Key: SPARK-10560
> URL: https://issues.apache.org/jira/browse/SPARK-10560
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: Yanbo Liang
>Priority: Minor
>
> StreamingLogisticRegressionWithSGD Python API lacks of some parameters 
> compared with Scala one, here we make them equality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10888) Add as.DataFrame as a synonym for createDataFrame

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10888:


Assignee: (was: Apache Spark)

> Add as.DataFrame as a synonym for createDataFrame
> -
>
> Key: SPARK-10888
> URL: https://issues.apache.org/jira/browse/SPARK-10888
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>Priority: Minor
>
> as.DataFrame is more a R-style like signature. 
> Also, I'd like to know if we could make the context, e.g. sqlContext global, 
> so that we do not have to specify it as an argument, when we each time create 
> a dataframe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10888) Add as.DataFrame as a synonym for createDataFrame

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938923#comment-14938923
 ] 

Apache Spark commented on SPARK-10888:
--

User 'NarineK' has created a pull request for this issue:
https://github.com/apache/spark/pull/8952

> Add as.DataFrame as a synonym for createDataFrame
> -
>
> Key: SPARK-10888
> URL: https://issues.apache.org/jira/browse/SPARK-10888
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>Priority: Minor
>
> as.DataFrame is more a R-style like signature. 
> Also, I'd like to know if we could make the context, e.g. sqlContext global, 
> so that we do not have to specify it as an argument, when we each time create 
> a dataframe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10889) Upgrade Kinesis Client Library

2015-09-30 Thread Avrohom Katz (JIRA)
Avrohom Katz created SPARK-10889:


 Summary: Upgrade Kinesis Client Library
 Key: SPARK-10889
 URL: https://issues.apache.org/jira/browse/SPARK-10889
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.4.2, 1.5.2, 1.6.0
Reporter: Avrohom Katz
Priority: Minor


Kinesis Client Library added a custom cloudwatch metric in 1.3.0 called 
MillisBehindLatest. This is very important for capacity planning and alerting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9617) Implement json_tuple

2015-09-30 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-9617.
-
   Resolution: Fixed
 Assignee: Nathan Howell
Fix Version/s: 1.6.0

This issue has been resolved by https://github.com/apache/spark/pull/7946/.

> Implement json_tuple
> 
>
> Key: SPARK-9617
> URL: https://issues.apache.org/jira/browse/SPARK-9617
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Nathan Howell
>Assignee: Nathan Howell
>Priority: Minor
> Fix For: 1.6.0
>
>
> Provide a native Spark implementation for {{json_tuple}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10887) Build HashedRelation outside of HashJoinNode

2015-09-30 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-10887:


Assignee: Yin Huai

> Build HashedRelation outside of HashJoinNode
> 
>
> Key: SPARK-10887
> URL: https://issues.apache.org/jira/browse/SPARK-10887
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> Right now, HashJoinNode builds a HashRelation for the build side. We can take 
> this process out. So, we can use HashJoinNode for both Broadcast join and 
> shuffled join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10887) Build HashedRelation outside of HashJoinNode

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10887:


Assignee: Yin Huai  (was: Apache Spark)

> Build HashedRelation outside of HashJoinNode
> 
>
> Key: SPARK-10887
> URL: https://issues.apache.org/jira/browse/SPARK-10887
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> Right now, HashJoinNode builds a HashRelation for the build side. We can take 
> this process out. So, we can use HashJoinNode for both Broadcast join and 
> shuffled join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10790) Dynamic Allocation does not request any executors if first stage needs less than or equal to spark.dynamicAllocation.initialExecutors

2015-09-30 Thread Jonathan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939060#comment-14939060
 ] 

Jonathan Kelly commented on SPARK-10790:


Thank you for the explanation and for such a quick fix, [~jerryshao]!

> Dynamic Allocation does not request any executors if first stage needs less 
> than or equal to spark.dynamicAllocation.initialExecutors
> -
>
> Key: SPARK-10790
> URL: https://issues.apache.org/jira/browse/SPARK-10790
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.5.0
>Reporter: Jonathan Kelly
>Assignee: Saisai Shao
> Fix For: 1.5.2, 1.6.0
>
>
> If you set spark.dynamicAllocation.initialExecutors > 0 (or 
> spark.dynamicAllocation.minExecutors, since 
> spark.dynamicAllocation.initialExecutors defaults to 
> spark.dynamicAllocation.minExecutors), and the number of tasks in the first 
> stage of your job is less than or equal to this min/init number of executors, 
> dynamic allocation won't actually request any executors and will just hang 
> indefinitely with the warning "Initial job has not accepted any resources; 
> check your cluster UI to ensure that workers are registered and have 
> sufficient resources".
> The cause appears to be that ExecutorAllocationManager does not request any 
> executors while the application is still initializing, but it still sets the 
> initial value of numExecutorsTarget to 
> spark.dynamicAllocation.initialExecutors. Once the job is running and has 
> submitted its first task, if the first task does not need more than 
> spark.dynamicAllocation.initialExecutors, 
> ExecutorAllocationManager.updateAndSyncNumExecutorsTarget() does not think 
> that it needs to request any executors, so it doesn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10892) Join with Data Frame returns wrong results

2015-09-30 Thread Ofer Mendelevitch (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ofer Mendelevitch updated SPARK-10892:
--
Attachment: data.json

Data file to run with the code, for reproducing.

> Join with Data Frame returns wrong results
> --
>
> Key: SPARK-10892
> URL: https://issues.apache.org/jira/browse/SPARK-10892
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Ofer Mendelevitch
>Priority: Critical
> Attachments: data.json
>
>
> I'm attaching a simplified reproducible example of the problem:
> 1. Loading a JSON file from HDFS as a Data Frame
> 2. Creating 3 data frames: PRCP, TMIN, TMAX
> 3. Joining the data frames together. Each of those has a column "value" with 
> the same name, so renaming them after the join.
> 4. The output seems incorrect; the first column has the correct values, but 
> the two other columns seem to have a copy of the values from the first column.
> Here's the sample code:
> import org.apache.spark.sql._
> val sqlc = new SQLContext(sc)
> val weather = sqlc.read.format("json").load("data.json")
> val prcp = weather.filter("metric = 'PRCP'").as("prcp").cache()
> val tmin = weather.filter("metric = 'TMIN'").as("tmin").cache()
> val tmax = weather.filter("metric = 'TMAX'").as("tmax").cache()
> prcp.filter("year=2012 and month=10").show()
> tmin.filter("year=2012 and month=10").show()
> tmax.filter("year=2012 and month=10").show()
> val out = (prcp.join(tmin, "date_str").join(tmax, "date_str")
>   .select(prcp("year"), prcp("month"), prcp("day"), prcp("date_str"),
> prcp("value").alias("PRCP"), tmin("value").alias("TMIN"),
> tmax("value").alias("TMAX")) )
> out.filter("year=2012 and month=10").show()
> The output is:
> ++---+--+-+---+-++
> |date_str|day|metric|month|station|value|year|
> ++---+--+-+---+-++
> |20121001|  1|  PRCP|   10|USW00023272|0|2012|
> |20121002|  2|  PRCP|   10|USW00023272|0|2012|
> |20121003|  3|  PRCP|   10|USW00023272|0|2012|
> |20121004|  4|  PRCP|   10|USW00023272|0|2012|
> |20121005|  5|  PRCP|   10|USW00023272|0|2012|
> |20121006|  6|  PRCP|   10|USW00023272|0|2012|
> |20121007|  7|  PRCP|   10|USW00023272|0|2012|
> |20121008|  8|  PRCP|   10|USW00023272|0|2012|
> |20121009|  9|  PRCP|   10|USW00023272|0|2012|
> |20121010| 10|  PRCP|   10|USW00023272|0|2012|
> |20121011| 11|  PRCP|   10|USW00023272|3|2012|
> |20121012| 12|  PRCP|   10|USW00023272|0|2012|
> |20121013| 13|  PRCP|   10|USW00023272|0|2012|
> |20121014| 14|  PRCP|   10|USW00023272|0|2012|
> |20121015| 15|  PRCP|   10|USW00023272|0|2012|
> |20121016| 16|  PRCP|   10|USW00023272|0|2012|
> |20121017| 17|  PRCP|   10|USW00023272|0|2012|
> |20121018| 18|  PRCP|   10|USW00023272|0|2012|
> |20121019| 19|  PRCP|   10|USW00023272|0|2012|
> |20121020| 20|  PRCP|   10|USW00023272|0|2012|
> ++---+--+-+---+-+——+
> ++---+--+-+---+-++
> |date_str|day|metric|month|station|value|year|
> ++---+--+-+---+-++
> |20121001|  1|  TMIN|   10|USW00023272|  139|2012|
> |20121002|  2|  TMIN|   10|USW00023272|  178|2012|
> |20121003|  3|  TMIN|   10|USW00023272|  144|2012|
> |20121004|  4|  TMIN|   10|USW00023272|  144|2012|
> |20121005|  5|  TMIN|   10|USW00023272|  139|2012|
> |20121006|  6|  TMIN|   10|USW00023272|  128|2012|
> |20121007|  7|  TMIN|   10|USW00023272|  122|2012|
> |20121008|  8|  TMIN|   10|USW00023272|  122|2012|
> |20121009|  9|  TMIN|   10|USW00023272|  139|2012|
> |20121010| 10|  TMIN|   10|USW00023272|  128|2012|
> |20121011| 11|  TMIN|   10|USW00023272|  122|2012|
> |20121012| 12|  TMIN|   10|USW00023272|  117|2012|
> |20121013| 13|  TMIN|   10|USW00023272|  122|2012|
> |20121014| 14|  TMIN|   10|USW00023272|  128|2012|
> |20121015| 15|  TMIN|   10|USW00023272|  128|2012|
> |20121016| 16|  TMIN|   10|USW00023272|  156|2012|
> |20121017| 17|  TMIN|   10|USW00023272|  139|2012|
> |20121018| 18|  TMIN|   10|USW00023272|  161|2012|
> |20121019| 19|  TMIN|   10|USW00023272|  133|2012|
> |20121020| 20|  TMIN|   10|USW00023272|  122|2012|
> ++---+--+-+---+-+——+
> ++---+--+-+---+-++
> |date_str|day|metric|month|station|value|year|
> ++---+--+-+---+-++
> |20121001|  1|  TMAX|   10|USW00023272|  322|2012|
> |20121002|  2|  TMAX|   10|USW00023272|  344|2012|
> |20121003|  3|  TMAX|   10|USW00023272|  222|2012|
> |20121004|  4|  TMAX|   10|USW00023272|  189|2012|
> |20121005|  5|  TMAX|   10|USW00023272|  194|2012|
> |20121006|  6|  TMAX|   10|USW00023272|  200|2012|
> 

[jira] [Updated] (SPARK-10887) Build HashedRelation outside of HashJoinNode

2015-09-30 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-10887:
-
Summary: Build HashedRelation outside of HashJoinNode  (was: Build 
HashRelation outside of HashJoinNode)

> Build HashedRelation outside of HashJoinNode
> 
>
> Key: SPARK-10887
> URL: https://issues.apache.org/jira/browse/SPARK-10887
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> Right now, HashJoinNode builds a HashRelation for the build side. We can take 
> this process out. So, we can use HashJoinNode for both Broadcast join and 
> shuffled join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10888) Add as.DataFrame as a synonym for createDataFrame

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10888:


Assignee: Apache Spark

> Add as.DataFrame as a synonym for createDataFrame
> -
>
> Key: SPARK-10888
> URL: https://issues.apache.org/jira/browse/SPARK-10888
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>Assignee: Apache Spark
>Priority: Minor
>
> as.DataFrame is more a R-style like signature. 
> Also, I'd like to know if we could make the context, e.g. sqlContext global, 
> so that we do not have to specify it as an argument, when we each time create 
> a dataframe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2015-09-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10561:
---
Description: 
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.

  was:
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.



> Provide tooling for auto-generating Spark SQL reference manual
> --
>
> Key: SPARK-10561
> URL: https://issues.apache.org/jira/browse/SPARK-10561
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Ted Yu
>
> Here is the discussion thread:
> http://search-hadoop.com/m/q3RTtcD20F1o62xE
> Richard Hillegas made the following suggestion:
> A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
> to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
> support provided by the Scala language. I am new to programming in Scala, so 
> I don't know whether the Scala ecosystem provides any good tools for 
> reverse-engineering a BNF from a class which extends 
> scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10890) "Column count does not match; SQL statement:" error in JDBCWriteSuite

2015-09-30 Thread Rick Hillegas (JIRA)
Rick Hillegas created SPARK-10890:
-

 Summary: "Column count does not match; SQL statement:" error in 
JDBCWriteSuite
 Key: SPARK-10890
 URL: https://issues.apache.org/jira/browse/SPARK-10890
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.5.0
Reporter: Rick Hillegas


I get the following error when I run the following test...

mvn -Dhadoop.version=2.4.0 
-DwildcardSuites=org.apache.spark.sql.jdbc.JDBCWriteSuite test

{noformat}
JDBCWriteSuite:
13:22:15.603 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
13:22:16.506 WARN org.apache.spark.metrics.MetricsSystem: Using default name 
DAGScheduler for source because spark.app.id is not set.
- Basic CREATE
- CREATE with overwrite
- CREATE then INSERT to append
- CREATE then INSERT to truncate
13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in 
stage 23.0 (TID 31)
org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement:
INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.message.DbException.get(DbException.java:144)
at org.h2.command.dml.Insert.prepare(Insert.java:265)
at org.h2.command.Parser.prepareCommand(Parser.java:247)
at org.h2.engine.Session.prepareLocal(Session.java:446)
at org.h2.engine.Session.prepareCommand(Session.java:388)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189)
at 
org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 in 
stage 23.0 (TID 32)
org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement:
INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.message.DbException.get(DbException.java:144)
at org.h2.command.dml.Insert.prepare(Insert.java:265)
at org.h2.command.Parser.prepareCommand(Parser.java:247)
at org.h2.engine.Session.prepareLocal(Session.java:446)
at org.h2.engine.Session.prepareCommand(Session.java:388)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189)
at 
org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
at 

[jira] [Commented] (SPARK-10889) Upgrade Kinesis Client Library

2015-09-30 Thread Burak Yavuz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939023#comment-14939023
 ] 

Burak Yavuz commented on SPARK-10889:
-

In addition, KCL 1.4.0 supports de-aggregation of records.

> Upgrade Kinesis Client Library
> --
>
> Key: SPARK-10889
> URL: https://issues.apache.org/jira/browse/SPARK-10889
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.4.2, 1.5.2, 1.6.0
>Reporter: Avrohom Katz
>Priority: Minor
>
> Kinesis Client Library added a custom cloudwatch metric in 1.3.0 called 
> MillisBehindLatest. This is very important for capacity planning and alerting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10337) Views are broken

2015-09-30 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-10337:
-
Assignee: Wenchen Fan

> Views are broken
> 
>
> Key: SPARK-10337
> URL: https://issues.apache.org/jira/browse/SPARK-10337
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Michael Armbrust
>Assignee: Wenchen Fan
>Priority: Critical
>
> I haven't dug into this yet... but it seems like this should work:
> This works:
> {code}
> SELECT * FROM 100milints
> {code}
> This seems to work:
> {code}
> CREATE VIEW testView AS SELECT * FROM 100milints
> {code}
> This fails:
> {code}
> SELECT * FROM testView
> org.apache.spark.sql.AnalysisException: cannot resolve '100milints.col' given 
> input columns id; line 1 pos 7
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:53)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:292)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:290)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:290)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:279)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:290)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:108)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:118)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:122)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:122)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:126)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>   at 

[jira] [Created] (SPARK-10892) Join with Data Frame returns wrong results

2015-09-30 Thread Ofer Mendelevitch (JIRA)
Ofer Mendelevitch created SPARK-10892:
-

 Summary: Join with Data Frame returns wrong results
 Key: SPARK-10892
 URL: https://issues.apache.org/jira/browse/SPARK-10892
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0, 1.4.1
Reporter: Ofer Mendelevitch
Priority: Critical


I'm attaching a simplified reproducible example of the problem:
1. Loading a JSON file from HDFS as a Data Frame
2. Creating 3 data frames: PRCP, TMIN, TMAX
3. Joining the data frames together. Each of those has a column "value" with 
the same name, so renaming them after the join.
4. The output seems incorrect; the first column has the correct values, but the 
two other columns seem to have a copy of the values from the first column.

Here's the sample code:

import org.apache.spark.sql._
val sqlc = new SQLContext(sc)

val weather = sqlc.read.format("json").load("data.json")
val prcp = weather.filter("metric = 'PRCP'").as("prcp").cache()
val tmin = weather.filter("metric = 'TMIN'").as("tmin").cache()
val tmax = weather.filter("metric = 'TMAX'").as("tmax").cache()

prcp.filter("year=2012 and month=10").show()
tmin.filter("year=2012 and month=10").show()
tmax.filter("year=2012 and month=10").show()

val out = (prcp.join(tmin, "date_str").join(tmax, "date_str")
  .select(prcp("year"), prcp("month"), prcp("day"), prcp("date_str"),
prcp("value").alias("PRCP"), tmin("value").alias("TMIN"),
tmax("value").alias("TMAX")) )
out.filter("year=2012 and month=10").show()

The output is:

++---+--+-+---+-++
|date_str|day|metric|month|station|value|year|
++---+--+-+---+-++
|20121001|  1|  PRCP|   10|USW00023272|0|2012|
|20121002|  2|  PRCP|   10|USW00023272|0|2012|
|20121003|  3|  PRCP|   10|USW00023272|0|2012|
|20121004|  4|  PRCP|   10|USW00023272|0|2012|
|20121005|  5|  PRCP|   10|USW00023272|0|2012|
|20121006|  6|  PRCP|   10|USW00023272|0|2012|
|20121007|  7|  PRCP|   10|USW00023272|0|2012|
|20121008|  8|  PRCP|   10|USW00023272|0|2012|
|20121009|  9|  PRCP|   10|USW00023272|0|2012|
|20121010| 10|  PRCP|   10|USW00023272|0|2012|
|20121011| 11|  PRCP|   10|USW00023272|3|2012|
|20121012| 12|  PRCP|   10|USW00023272|0|2012|
|20121013| 13|  PRCP|   10|USW00023272|0|2012|
|20121014| 14|  PRCP|   10|USW00023272|0|2012|
|20121015| 15|  PRCP|   10|USW00023272|0|2012|
|20121016| 16|  PRCP|   10|USW00023272|0|2012|
|20121017| 17|  PRCP|   10|USW00023272|0|2012|
|20121018| 18|  PRCP|   10|USW00023272|0|2012|
|20121019| 19|  PRCP|   10|USW00023272|0|2012|
|20121020| 20|  PRCP|   10|USW00023272|0|2012|
++---+--+-+---+-+——+

++---+--+-+---+-++
|date_str|day|metric|month|station|value|year|
++---+--+-+---+-++
|20121001|  1|  TMIN|   10|USW00023272|  139|2012|
|20121002|  2|  TMIN|   10|USW00023272|  178|2012|
|20121003|  3|  TMIN|   10|USW00023272|  144|2012|
|20121004|  4|  TMIN|   10|USW00023272|  144|2012|
|20121005|  5|  TMIN|   10|USW00023272|  139|2012|
|20121006|  6|  TMIN|   10|USW00023272|  128|2012|
|20121007|  7|  TMIN|   10|USW00023272|  122|2012|
|20121008|  8|  TMIN|   10|USW00023272|  122|2012|
|20121009|  9|  TMIN|   10|USW00023272|  139|2012|
|20121010| 10|  TMIN|   10|USW00023272|  128|2012|
|20121011| 11|  TMIN|   10|USW00023272|  122|2012|
|20121012| 12|  TMIN|   10|USW00023272|  117|2012|
|20121013| 13|  TMIN|   10|USW00023272|  122|2012|
|20121014| 14|  TMIN|   10|USW00023272|  128|2012|
|20121015| 15|  TMIN|   10|USW00023272|  128|2012|
|20121016| 16|  TMIN|   10|USW00023272|  156|2012|
|20121017| 17|  TMIN|   10|USW00023272|  139|2012|
|20121018| 18|  TMIN|   10|USW00023272|  161|2012|
|20121019| 19|  TMIN|   10|USW00023272|  133|2012|
|20121020| 20|  TMIN|   10|USW00023272|  122|2012|
++---+--+-+---+-+——+


++---+--+-+---+-++
|date_str|day|metric|month|station|value|year|
++---+--+-+---+-++
|20121001|  1|  TMAX|   10|USW00023272|  322|2012|
|20121002|  2|  TMAX|   10|USW00023272|  344|2012|
|20121003|  3|  TMAX|   10|USW00023272|  222|2012|
|20121004|  4|  TMAX|   10|USW00023272|  189|2012|
|20121005|  5|  TMAX|   10|USW00023272|  194|2012|
|20121006|  6|  TMAX|   10|USW00023272|  200|2012|
|20121007|  7|  TMAX|   10|USW00023272|  167|2012|
|20121008|  8|  TMAX|   10|USW00023272|  183|2012|
|20121009|  9|  TMAX|   10|USW00023272|  194|2012|
|20121010| 10|  TMAX|   10|USW00023272|  183|2012|
|20121011| 11|  TMAX|   10|USW00023272|  139|2012|
|20121012| 12|  TMAX|   10|USW00023272|  161|2012|
|20121013| 13|  TMAX|   10|USW00023272|  211|2012|
|20121014| 14|  TMAX|   10|USW00023272|  189|2012|
|20121015| 15|  

[jira] [Commented] (SPARK-9344) Spark SQL documentation does not clarify INSERT INTO behavior

2015-09-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939130#comment-14939130
 ] 

Josh Rosen commented on SPARK-9344:
---

I think this is the relevant part of the stacktrace:

{code}
org.apache.spark.sql.AnalysisException: no such table test_data1; 
{code}

The Data Sources {{save}} method does not automatically register / create 
tables or temp tables.  Did you mean to use {{saveAsTable}} instead? See  
https://spark.apache.org/docs/latest/sql-programming-guide.html#saving-to-persistent-tables

> Spark SQL documentation does not clarify INSERT INTO behavior
> -
>
> Key: SPARK-9344
> URL: https://issues.apache.org/jira/browse/SPARK-9344
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 1.4.1
>Reporter: Simeon Simeonov
>Priority: Minor
>  Labels: documentation, sql
>
> The Spark SQL documentation does not address {{INSERT INTO}} behavior. The 
> section on Hive compatibility is misleading as it claims support for "the 
> vast majority of Hive features". The user mailing list has conflicting 
> information, including posts that claim {{INSERT INTO}} support targeting 1.0.
> In 1.4.1, using Hive {{INSERT INTO}} syntax generates parse errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-09-30 Thread Jo Desmet (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );


  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. I did 
not test on a yarn cluster.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );



> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> Java:
> SparkContext sc = new SparkContext(conf);
> HiveContext sqlContext = new HiveContext(sc);
> DataFrame df = sqlContext.read().json(getInputPath("input.json"));
> 
> df = df.withColumn(
>   "previous",
>   lag(dataFrame.col("VBB"), 1)
> .over(Window.orderBy(dataFrame.col("VAA")))
>   );



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10893) Lag Analytic function broken

2015-09-30 Thread Jo Desmet (JIRA)
Jo Desmet created SPARK-10893:
-

 Summary: Lag Analytic function broken
 Key: SPARK-10893
 URL: https://issues.apache.org/jira/browse/SPARK-10893
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 1.5.0
 Environment: Spark Standalone Cluster on Linux
Reporter: Jo Desmet


Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-09-30 Thread Jo Desmet (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
{code:title=Bar.java|borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );



> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> Java:
> {code:title=Bar.java|borderStyle=solid}
> SparkContext sc = new SparkContext(conf);
> HiveContext sqlContext = new HiveContext(sc);
> DataFrame df = sqlContext.read().json(getInputPath("input.json"));
> 
> df = df.withColumn(
>   "previous",
>   lag(dataFrame.col("VBB"), 1)
> .over(Window.orderBy(dataFrame.col("VAA")))
>   );
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-09-30 Thread Jo Desmet (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
{code:title=Bar.java|borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}


> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> Java:
> {code:borderStyle=solid}
> SparkContext sc = new SparkContext(conf);
> HiveContext sqlContext = new HiveContext(sc);
> DataFrame df = sqlContext.read().json(getInputPath("input.json"));
> 
> df = df.withColumn(
>   "previous",
>   lag(dataFrame.col("VBB"), 1)
> .over(Window.orderBy(dataFrame.col("VAA")))
>   );
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-09-30 Thread Jo Desmet (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}


> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> {code}
> Java:
> {code:borderStyle=solid}
> SparkContext sc = new SparkContext(conf);
> HiveContext sqlContext = new HiveContext(sc);
> DataFrame df = sqlContext.read().json(getInputPath("input.json"));
> 
> df = df.withColumn(
>   "previous",
>   lag(dataFrame.col("VBB"), 1)
> .over(Window.orderBy(dataFrame.col("VAA")))
>   );
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-09-30 Thread Jo Desmet (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}





  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}

{code}
[null,null,null]
[A,1,null]
[B,-1,1]
[C,2,-1]
[d,3,2]





> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> {code}
> Java:
> {code:borderStyle=solid}
> SparkContext sc = new SparkContext(conf);
> HiveContext sqlContext = new HiveContext(sc);
> DataFrame df = sqlContext.read().json(getInputPath("input.json"));
> 
> df = df.withColumn(
>   "previous",
>   lag(dataFrame.col("VBB"), 1)
> .over(Window.orderBy(dataFrame.col("VAA")))
>   );
> {code}
> Expected Result:
> {code:borderStyle=solid}
> {"VAA":null, "VBB":null, "previous":null}
> {"VAA":"A", "VBB":1, "previous":null}
> {"VAA":"B", "VBB":-1, "previous":1}
> {"VAA":"C", "VBB":2, "previous":-1}
> {"VAA":"d", "VBB":3, "previous":2}
> {code}
> Actual Result:
> {code:borderStyle=solid}
> {"VAA":null, "VBB":null, "previous":null}
> {"VAA":"A", "VBB":1, "previous":103079215105}
> {"VAA":"B", "VBB":-1, "previous":103079215105}
> {"VAA":"C", "VBB":2, "previous":103079215105}
> {"VAA":"d", "VBB":3, "previous":103079215105}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-09-30 Thread Jo Desmet (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":103079215105}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}





  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}
{code}

Actual Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":103079215105}
{"VAA":"B", "VBB":-1, "previous":103079215105}
{"VAA":"C", "VBB":2, "previous":103079215105}
{"VAA":"d", "VBB":3, "previous":103079215105}
{code}






> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> {code}
> Java:
> {code:borderStyle=solid}
> SparkContext sc = new SparkContext(conf);
> HiveContext sqlContext = new HiveContext(sc);
> DataFrame df = sqlContext.read().json(getInputPath("input.json"));
> 
> df = df.withColumn(
>   "previous",
>   lag(dataFrame.col("VBB"), 1)
> .over(Window.orderBy(dataFrame.col("VAA")))
>   );
> {code}
> Expected Result:
> {code:borderStyle=solid}
> {"VAA":null, "VBB":null, "previous":null}
> {"VAA":"A", "VBB":1, "previous":null}
> {"VAA":"B", "VBB":-1, "previous":1}
> {"VAA":"C", "VBB":2, "previous":-1}
> {"VAA":"d", "VBB":3, "previous":2}
> {code}
> Actual Result:
> {code:borderStyle=solid}
> {"VAA":null, "VBB":null, "previous":103079215105}
> {"VAA":"A", "VBB":1, "previous":103079215105}
> {"VAA":"B", "VBB":-1, "previous":103079215105}
> {"VAA":"C", "VBB":2, "previous":103079215105}
> {"VAA":"d", "VBB":3, "previous":103079215105}

[jira] [Resolved] (SPARK-10807) Add as.data.frame() as a synonym for collect()

2015-09-30 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-10807.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 8908
[https://github.com/apache/spark/pull/8908]

> Add as.data.frame() as a synonym for collect()
> --
>
> Key: SPARK-10807
> URL: https://issues.apache.org/jira/browse/SPARK-10807
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
>Priority: Minor
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10807) Add as.data.frame() as a synonym for collect()

2015-09-30 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-10807:
--
Assignee: Oscar D. Lara Yejas

> Add as.data.frame() as a synonym for collect()
> --
>
> Key: SPARK-10807
> URL: https://issues.apache.org/jira/browse/SPARK-10807
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
>Assignee: Oscar D. Lara Yejas
>Priority: Minor
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-09-30 Thread Jo Desmet (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. I did 
not test on a yarn cluster.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );


  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );



> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. I 
> did not test on a yarn cluster.
> Input Jason:
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> Java:
> SparkContext sc = new SparkContext(conf);
> HiveContext sqlContext = new HiveContext(sc);
> DataFrame df = sqlContext.read().json(getInputPath("input.json"));
> 
> df = df.withColumn(
>   "previous",
>   lag(dataFrame.col("VBB"), 1)
> .over(Window.orderBy(dataFrame.col("VAA")))
>   );



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10893) Lag Analytic function broken

2015-09-30 Thread Jo Desmet (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
--
Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}

Expected Result:
{code:borderStyle=solid}
{"VAA":null, "VBB":null, "previous":null}
{"VAA":"A", "VBB":1, "previous":null}
{"VAA":"B", "VBB":-1, "previous":1}
{"VAA":"C", "VBB":2, "previous":-1}
{"VAA":"d", "VBB":3, "previous":2}

{code}
[null,null,null]
[A,1,null]
[B,-1,1]
[C,2,-1]
[d,3,2]




  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
SparkContext sc = new SparkContext(conf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame df = sqlContext.read().json(getInputPath("input.json"));

df = df.withColumn(
  "previous",
  lag(dataFrame.col("VBB"), 1)
.over(Window.orderBy(dataFrame.col("VAA")))
  );
{code}


> Lag Analytic function broken
> 
>
> Key: SPARK-10893
> URL: https://issues.apache.org/jira/browse/SPARK-10893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0
> Environment: Spark Standalone Cluster on Linux
>Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> {code}
> Java:
> {code:borderStyle=solid}
> SparkContext sc = new SparkContext(conf);
> HiveContext sqlContext = new HiveContext(sc);
> DataFrame df = sqlContext.read().json(getInputPath("input.json"));
> 
> df = df.withColumn(
>   "previous",
>   lag(dataFrame.col("VBB"), 1)
> .over(Window.orderBy(dataFrame.col("VAA")))
>   );
> {code}
> Expected Result:
> {code:borderStyle=solid}
> {"VAA":null, "VBB":null, "previous":null}
> {"VAA":"A", "VBB":1, "previous":null}
> {"VAA":"B", "VBB":-1, "previous":1}
> {"VAA":"C", "VBB":2, "previous":-1}
> {"VAA":"d", "VBB":3, "previous":2}
> {code}
> [null,null,null]
> [A,1,null]
> [B,-1,1]
> [C,2,-1]
> [d,3,2]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10894) Add 'drop' support for DataFrame's subset function

2015-09-30 Thread Weiqiang Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939339#comment-14939339
 ] 

Weiqiang Zhuang commented on SPARK-10894:
-

I am working on this.

> Add 'drop' support for DataFrame's subset function
> --
>
> Key: SPARK-10894
> URL: https://issues.apache.org/jira/browse/SPARK-10894
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Weiqiang Zhuang
>
> SparkR DataFrame can be subset to get one or more columns of the dataset. The 
> current '[' implementation does not support 'drop' when is asked for just one 
> column. This is not consistent with the R syntax:
> x[i, j, ... , drop = TRUE]
> # in R, when drop is FALSE, remain as data.frame
> > class(iris[, "Sepal.Width", drop=F])
> [1] "data.frame"
> # when drop is TRUE (default), drop to be a vector
> > class(iris[, "Sepal.Width", drop=T])
> [1] "numeric"
> > class(iris[,"Sepal.Width"])
> [1] "numeric"
> > df <- createDataFrame(sqlContext, iris)
> # in SparkR, 'drop' argument has no impact
> > class(df[,"Sepal_Width", drop=F])
> [1] "DataFrame"
> attr(,"package")
> [1] "SparkR"
> # should have dropped to be a Column class instead
> > class(df[,"Sepal_Width", drop=T])
> [1] "DataFrame"
> attr(,"package")
> [1] "SparkR"
> > class(df[,"Sepal_Width"])
> [1] "DataFrame"
> attr(,"package")
> [1] "SparkR"
> We should add the 'drop' support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10894) Add 'drop' support for DataFrame's subset function

2015-09-30 Thread Weiqiang Zhuang (JIRA)
Weiqiang Zhuang created SPARK-10894:
---

 Summary: Add 'drop' support for DataFrame's subset function
 Key: SPARK-10894
 URL: https://issues.apache.org/jira/browse/SPARK-10894
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Weiqiang Zhuang


SparkR DataFrame can be subset to get one or more columns of the dataset. The 
current '[' implementation does not support 'drop' when is asked for just one 
column. This is not consistent with the R syntax:
x[i, j, ... , drop = TRUE]

# in R, when drop is FALSE, remain as data.frame
> class(iris[, "Sepal.Width", drop=F])
[1] "data.frame"
# when drop is TRUE (default), drop to be a vector
> class(iris[, "Sepal.Width", drop=T])
[1] "numeric"
> class(iris[,"Sepal.Width"])
[1] "numeric"

> df <- createDataFrame(sqlContext, iris)
# in SparkR, 'drop' argument has no impact
> class(df[,"Sepal_Width", drop=F])
[1] "DataFrame"
attr(,"package")
[1] "SparkR"
# should have dropped to be a Column class instead
> class(df[,"Sepal_Width", drop=T])
[1] "DataFrame"
attr(,"package")
[1] "SparkR"
> class(df[,"Sepal_Width"])
[1] "DataFrame"
attr(,"package")
[1] "SparkR"

We should add the 'drop' support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10891) Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10891:


Assignee: Apache Spark

> Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka
> ---
>
> Key: SPARK-10891
> URL: https://issues.apache.org/jira/browse/SPARK-10891
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Burak Yavuz
>Assignee: Apache Spark
>
> There is support for message handler in Direct Kafka Stream, which allows 
> arbitrary T to be the output of the stream instead of Array[Byte]. This is a 
> very useful function, therefore should exist in Kinesis as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10891) Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939359#comment-14939359
 ] 

Apache Spark commented on SPARK-10891:
--

User 'brkyvz' has created a pull request for this issue:
https://github.com/apache/spark/pull/8954

> Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka
> ---
>
> Key: SPARK-10891
> URL: https://issues.apache.org/jira/browse/SPARK-10891
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Burak Yavuz
>
> There is support for message handler in Direct Kafka Stream, which allows 
> arbitrary T to be the output of the stream instead of Array[Byte]. This is a 
> very useful function, therefore should exist in Kinesis as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10891) Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10891:


Assignee: (was: Apache Spark)

> Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka
> ---
>
> Key: SPARK-10891
> URL: https://issues.apache.org/jira/browse/SPARK-10891
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Burak Yavuz
>
> There is support for message handler in Direct Kafka Stream, which allows 
> arbitrary T to be the output of the stream instead of Array[Byte]. This is a 
> very useful function, therefore should exist in Kinesis as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5569) Checkpoints cannot reference classes defined outside of Spark's assembly

2015-09-30 Thread Deming Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939353#comment-14939353
 ] 

Deming Zhu commented on SPARK-5569:
---

I encount the exactly same problem with you, and I think it's an bug of 
ObjectInputStreamWithLoader.

ObjectInputStreamWithLoader extends the ObjectInputStream class and override 
its resolveClass method.
But Instead of Using Class.forName(desc,false,loader), Spark uses 
loader.loadClass(desc) to instance the class.
which do not works when there's Array type class.
For example: 
Class.forName("[Ljava.lang.String",false,loader) works well while 
loader.loadClass("[Ljava.lang.String") would throw an class not found exception.
details of the difference can be found here. 

http://bugs.java.com/view_bug.do?bug_id=6446627

I would make a pull request to Spark.
Before it's accepted, 
you can redefine the ObjectInputStreamWithLoader and replace it with the 
original one.

> Checkpoints cannot reference classes defined outside of Spark's assembly
> 
>
> Key: SPARK-5569
> URL: https://issues.apache.org/jira/browse/SPARK-5569
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Patrick Wendell
>
> Not sure if this is a bug or a feature, but it's not obvious, so wanted to 
> create a JIRA to make sure we document this behavior.
> First documented by Cody Koeninger:
> https://gist.github.com/koeninger/561a61482cd1b5b3600c
> {code}
> 15/01/12 16:07:07 INFO CheckpointReader: Attempting to load checkpoint from 
> file file:/var/tmp/cp/checkpoint-142110041.bk
> 15/01/12 16:07:07 WARN CheckpointReader: Error reading checkpoint from file 
> file:/var/tmp/cp/checkpoint-142110041.bk
> java.io.IOException: java.lang.ClassNotFoundException: 
> org.apache.spark.rdd.kafka.KafkaRDDPartition
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1043)
> at 
> org.apache.spark.streaming.dstream.DStreamCheckpointData.readObject(DStreamCheckpointData.scala:146)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500)
> at 
> org.apache.spark.streaming.DStreamGraph$$anonfun$readObject$1.apply$mcV$sp(DStreamGraph.scala:180)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1040)
> at 
> org.apache.spark.streaming.DStreamGraph.readObject(DStreamGraph.scala:176)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> 

[jira] [Commented] (SPARK-9472) Consistent hadoop config for streaming

2015-09-30 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936866#comment-14936866
 ] 

Cody Koeninger commented on SPARK-9472:
---

You can get around this by passing in the hadoop context that you want as
an argument to getOrCreate

StreamingContext.getOrCreate(somePath, () => someFunction, SparkHadoopUtil
.get.newConfiguration(someSparkConf)

On Tue, Sep 29, 2015 at 10:00 PM, Russell Alexander Spitzer (JIRA) <



> Consistent hadoop config for streaming
> --
>
> Key: SPARK-9472
> URL: https://issues.apache.org/jira/browse/SPARK-9472
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Cody Koeninger
>Assignee: Cody Koeninger
>Priority: Minor
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10883) Be able to build each module individually

2015-09-30 Thread JIRA
Jean-Baptiste Onofré created SPARK-10883:


 Summary: Be able to build each module individually
 Key: SPARK-10883
 URL: https://issues.apache.org/jira/browse/SPARK-10883
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Jean-Baptiste Onofré


Right now, due to the location of the scalastyle-config.xml location, it's not 
possible to build an individual module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10884) Support prediction on single instance for PredictionModel and ClassificationModel

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936951#comment-14936951
 ] 

Apache Spark commented on SPARK-10884:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/8883

> Support prediction on single instance for PredictionModel and 
> ClassificationModel
> -
>
> Key: SPARK-10884
> URL: https://issues.apache.org/jira/browse/SPARK-10884
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Yanbo Liang
>
> Support prediction on single instance for regression and classification 
> related models. 
> Add corresponding test cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10883) Be able to build each module individually

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936898#comment-14936898
 ] 

Apache Spark commented on SPARK-10883:
--

User 'jbonofre' has created a pull request for this issue:
https://github.com/apache/spark/pull/8949

> Be able to build each module individually
> -
>
> Key: SPARK-10883
> URL: https://issues.apache.org/jira/browse/SPARK-10883
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Jean-Baptiste Onofré
>
> Right now, due to the location of the scalastyle-config.xml location, it's 
> not possible to build an individual module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10883) Be able to build each module individually

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10883:


Assignee: Apache Spark

> Be able to build each module individually
> -
>
> Key: SPARK-10883
> URL: https://issues.apache.org/jira/browse/SPARK-10883
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Jean-Baptiste Onofré
>Assignee: Apache Spark
>
> Right now, due to the location of the scalastyle-config.xml location, it's 
> not possible to build an individual module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10883) Be able to build each module individually

2015-09-30 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936944#comment-14936944
 ] 

Marcelo Vanzin commented on SPARK-10883:


How are you trying to build the individual module? Something like this works 
fine for me and invokes scalastyle successfully:

{code}
mvn  install -pl :spark-launcher_2.10
{code}

I believe you run into problems if you use {{-f path/to/pom.xml}}, which is 
mildly annoying considering there is an alternative. But as your PR shows, 
fixing that case is kinda noisy.

> Be able to build each module individually
> -
>
> Key: SPARK-10883
> URL: https://issues.apache.org/jira/browse/SPARK-10883
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Jean-Baptiste Onofré
>
> Right now, due to the location of the scalastyle-config.xml location, it's 
> not possible to build an individual module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10413) Model should support prediction on single instance

2015-09-30 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936947#comment-14936947
 ] 

Yanbo Liang commented on SPARK-10413:
-

[~mengxr] [~josephkb]
I found this issue involved too many models and files, so let's separate this 
JIRA into sub tasks.
I have opened SPARK-10884 to make all classification and regression model 
support prediction on single instance.
Other community members who are interested in this issue can open other sub 
tasks and work on them.

> Model should support prediction on single instance
> --
>
> Key: SPARK-10413
> URL: https://issues.apache.org/jira/browse/SPARK-10413
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Reporter: Xiangrui Meng
>Priority: Critical
>
> Currently models in the pipeline API only implement transform(DataFrame). It 
> would be quite useful to support prediction on single instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10884) Support prediction on single instance for PredictionModel and ClassificationModel

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10884:


Assignee: (was: Apache Spark)

> Support prediction on single instance for PredictionModel and 
> ClassificationModel
> -
>
> Key: SPARK-10884
> URL: https://issues.apache.org/jira/browse/SPARK-10884
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Yanbo Liang
>
> Support prediction on single instance for regression and classification 
> related models. 
> Add corresponding test cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10884) Support prediction on single instance for PredictionModel and ClassificationModel

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10884:


Assignee: Apache Spark

> Support prediction on single instance for PredictionModel and 
> ClassificationModel
> -
>
> Key: SPARK-10884
> URL: https://issues.apache.org/jira/browse/SPARK-10884
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Yanbo Liang
>Assignee: Apache Spark
>
> Support prediction on single instance for regression and classification 
> related models. 
> Add corresponding test cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10883) Be able to build each module individually

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10883:


Assignee: (was: Apache Spark)

> Be able to build each module individually
> -
>
> Key: SPARK-10883
> URL: https://issues.apache.org/jira/browse/SPARK-10883
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Jean-Baptiste Onofré
>
> Right now, due to the location of the scalastyle-config.xml location, it's 
> not possible to build an individual module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10883) Be able to build each module individually

2015-09-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936899#comment-14936899
 ] 

Jean-Baptiste Onofré commented on SPARK-10883:
--

https://github.com/apache/spark/pull/8949

> Be able to build each module individually
> -
>
> Key: SPARK-10883
> URL: https://issues.apache.org/jira/browse/SPARK-10883
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Jean-Baptiste Onofré
>
> Right now, due to the location of the scalastyle-config.xml location, it's 
> not possible to build an individual module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-10883) Be able to build each module individually

2015-09-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré updated SPARK-10883:
-
Comment: was deleted

(was: https://github.com/apache/spark/pull/8949)

> Be able to build each module individually
> -
>
> Key: SPARK-10883
> URL: https://issues.apache.org/jira/browse/SPARK-10883
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Jean-Baptiste Onofré
>
> Right now, due to the location of the scalastyle-config.xml location, it's 
> not possible to build an individual module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10884) Support prediction on single instance for PredictionModel and ClassificationModel

2015-09-30 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-10884:
---

 Summary: Support prediction on single instance for PredictionModel 
and ClassificationModel
 Key: SPARK-10884
 URL: https://issues.apache.org/jira/browse/SPARK-10884
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Yanbo Liang


Support prediction on single instance for PredictionModel, ClassificationModel 
and their subclass. Add corresponding test cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10884) Support prediction on single instance for PredictionModel and ClassificationModel

2015-09-30 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-10884:

Description: 
Support prediction on single instance for regression and classification related 
models. 
Add corresponding test cases.

  was:Support prediction on single instance for PredictionModel, 
ClassificationModel and their subclass. Add corresponding test cases.


> Support prediction on single instance for PredictionModel and 
> ClassificationModel
> -
>
> Key: SPARK-10884
> URL: https://issues.apache.org/jira/browse/SPARK-10884
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Yanbo Liang
>
> Support prediction on single instance for regression and classification 
> related models. 
> Add corresponding test cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat

2015-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937473#comment-14937473
 ] 

Apache Spark commented on SPARK-10058:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/8946

> Flaky test: HeartbeatReceiverSuite: normal heartbeat
> 
>
> Key: SPARK-10058
> URL: https://issues.apache.org/jira/browse/SPARK-10058
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Reporter: Davies Liu
>Assignee: Shixiong Zhu
>Priority: Critical
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/
> {code}
> Error Message
> 3 did not equal 2
> Stacktrace
> sbt.ForkMain$ForkError: 3 did not equal 2
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
>   at 

[jira] [Assigned] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat

2015-09-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10058:


Assignee: Apache Spark  (was: Shixiong Zhu)

> Flaky test: HeartbeatReceiverSuite: normal heartbeat
> 
>
> Key: SPARK-10058
> URL: https://issues.apache.org/jira/browse/SPARK-10058
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Reporter: Davies Liu
>Assignee: Apache Spark
>Priority: Critical
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/
> {code}
> Error Message
> 3 did not equal 2
> Stacktrace
> sbt.ForkMain$ForkError: 3 did not equal 2
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> 

[jira] [Commented] (SPARK-9472) Consistent hadoop config for streaming

2015-09-30 Thread Russell Alexander Spitzer (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937471#comment-14937471
 ] 

Russell Alexander Spitzer commented on SPARK-9472:
--

Yeah that's the workaround we recommend now, but that requires every 
application to manually specify the files. We just want our distribution to not 
require that much manual intervention (normally we automatically pass in 
required hadoop conf based on the users security setup via spark.hadoop)

> Consistent hadoop config for streaming
> --
>
> Key: SPARK-9472
> URL: https://issues.apache.org/jira/browse/SPARK-9472
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Cody Koeninger
>Assignee: Cody Koeninger
>Priority: Minor
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >