[jira] [Resolved] (SPARK-3747) TaskResultGetter could incorrectly abort a stage if it cannot get result for a specific task

2014-10-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-3747.

  Resolution: Fixed
   Fix Version/s: 1.2.0
  1.1.1
Target Version/s: 1.1.1, 1.2.0  (was: 1.1.0, 1.2.0)

> TaskResultGetter could incorrectly abort a stage if it cannot get result for 
> a specific task
> 
>
> Key: SPARK-3747
> URL: https://issues.apache.org/jira/browse/SPARK-3747
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 1.1.1, 1.2.0
>
>
> There is a "return" in logUncaughtExceptions, but we are catching all 
> Exceptions. Instead, we should be catching NonFatal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3755) Do not bind port 1 - 1024 to server in spark

2014-10-01 Thread wangfei (JIRA)
wangfei created SPARK-3755:
--

 Summary: Do not bind port 1 - 1024 to server in spark
 Key: SPARK-3755
 URL: https://issues.apache.org/jira/browse/SPARK-3755
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: wangfei


Non-root user use port 1- 1024 to start jetty server will get the exception " 
java.net.SocketException: Permission denied"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3755) Do not bind port 1 - 1024 to server in spark

2014-10-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154517#comment-14154517
 ] 

Apache Spark commented on SPARK-3755:
-

User 'scwf' has created a pull request for this issue:
https://github.com/apache/spark/pull/2610

> Do not bind port 1 - 1024 to server in spark
> 
>
> Key: SPARK-3755
> URL: https://issues.apache.org/jira/browse/SPARK-3755
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: wangfei
>
> Non-root user use port 1- 1024 to start jetty server will get the exception " 
> java.net.SocketException: Permission denied"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3756) check exception is caused by an address-port collision when binding properly

2014-10-01 Thread wangfei (JIRA)
wangfei created SPARK-3756:
--

 Summary: check exception is caused by an address-port collision 
when binding properly
 Key: SPARK-3756
 URL: https://issues.apache.org/jira/browse/SPARK-3756
 Project: Spark
  Issue Type: Bug
Reporter: wangfei






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3756) check exception is caused by an address-port collision when binding properly

2014-10-01 Thread wangfei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangfei updated SPARK-3756:
---
Affects Version/s: 1.1.0

> check exception is caused by an address-port collision when binding properly
> 
>
> Key: SPARK-3756
> URL: https://issues.apache.org/jira/browse/SPARK-3756
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: wangfei
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3756) check exception is caused by an address-port collision when binding properly

2014-10-01 Thread wangfei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangfei updated SPARK-3756:
---
 Description: a tiny bug in method  isBindCollision
Target Version/s: 1.2.0

> check exception is caused by an address-port collision when binding properly
> 
>
> Key: SPARK-3756
> URL: https://issues.apache.org/jira/browse/SPARK-3756
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: wangfei
>
> a tiny bug in method  isBindCollision



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3756) check exception is caused by an address-port collision properly

2014-10-01 Thread wangfei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangfei updated SPARK-3756:
---
Summary: check exception is caused by an address-port collision properly  
(was: check exception is caused by an address-port collision when binding 
properly)

> check exception is caused by an address-port collision properly
> ---
>
> Key: SPARK-3756
> URL: https://issues.apache.org/jira/browse/SPARK-3756
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: wangfei
>
> a tiny bug in method  isBindCollision



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3748) Log thread name in unit test logs

2014-10-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-3748.

   Resolution: Fixed
Fix Version/s: 1.2.0

> Log thread name in unit test logs
> -
>
> Key: SPARK-3748
> URL: https://issues.apache.org/jira/browse/SPARK-3748
> Project: Spark
>  Issue Type: New Feature
>  Components: Project Infra, Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Minor
> Fix For: 1.2.0
>
>
> Thread names are often useful for correlating failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3751) DecisionTreeRunner functionality improvement

2014-10-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-3751:
-
Assignee: Joseph K. Bradley

> DecisionTreeRunner functionality improvement
> 
>
> Key: SPARK-3751
> URL: https://issues.apache.org/jira/browse/SPARK-3751
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
> Fix For: 1.2.0
>
>
> DecisionTreeRunner functionality additions:
> * Allow user to pass in a test dataset
> * Do not print full model if the model is too large.
> As part of this, modify DecisionTreeModel and RandomForestModel to allow 
> printing less info.  Proposed updates:
> * toString: prints model summary
> * toDebugString: prints full model (named after RDD.toDebugString)
> Similar update to Python API:
> * __repr__() now prints a model summary
> * toDebugString() now prints the full model



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3751) DecisionTreeRunner functionality improvement

2014-10-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-3751.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2604
[https://github.com/apache/spark/pull/2604]

> DecisionTreeRunner functionality improvement
> 
>
> Key: SPARK-3751
> URL: https://issues.apache.org/jira/browse/SPARK-3751
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
> Fix For: 1.2.0
>
>
> DecisionTreeRunner functionality additions:
> * Allow user to pass in a test dataset
> * Do not print full model if the model is too large.
> As part of this, modify DecisionTreeModel and RandomForestModel to allow 
> printing less info.  Proposed updates:
> * toString: prints model summary
> * toDebugString: prints full model (named after RDD.toDebugString)
> Similar update to Python API:
> * __repr__() now prints a model summary
> * toDebugString() now prints the full model



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3756) check exception is caused by an address-port collision properly

2014-10-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154549#comment-14154549
 ] 

Apache Spark commented on SPARK-3756:
-

User 'scwf' has created a pull request for this issue:
https://github.com/apache/spark/pull/2611

> check exception is caused by an address-port collision properly
> ---
>
> Key: SPARK-3756
> URL: https://issues.apache.org/jira/browse/SPARK-3756
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: wangfei
>
> a tiny bug in method  isBindCollision



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3757) mvn clean doesn't delete some files

2014-10-01 Thread Masayoshi TSUZUKI (JIRA)
Masayoshi TSUZUKI created SPARK-3757:


 Summary: mvn clean doesn't delete some files
 Key: SPARK-3757
 URL: https://issues.apache.org/jira/browse/SPARK-3757
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.1.0
Reporter: Masayoshi TSUZUKI
Priority: Trivial


When we build spak using {{mvn package}},
{{/python/lib/py4j-0.8.2.1-src.zip}} is unzipped and some py4j files (*.py) are 
created into {{/python/build/}}.
But this directory and these files are not deleted while {{mvn clean}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3758) Wrong EOL character in *.cmd

2014-10-01 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created SPARK-3758:
-

 Summary: Wrong EOL character in *.cmd
 Key: SPARK-3758
 URL: https://issues.apache.org/jira/browse/SPARK-3758
 Project: Spark
  Issue Type: Bug
  Components: Windows
Affects Versions: 1.2.0
Reporter: Kousuke Saruta


Windows platform expects the EOL characters are CRLF but in *.cmd except for 
compute-classpath.cmd, the EOL characters are LF.
To avoid unexpected problem, we should replace LF with CRLF in *.cmd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3758) Wrong EOL character in *.cmd

2014-10-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154595#comment-14154595
 ] 

Apache Spark commented on SPARK-3758:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/2612

> Wrong EOL character in *.cmd
> 
>
> Key: SPARK-3758
> URL: https://issues.apache.org/jira/browse/SPARK-3758
> Project: Spark
>  Issue Type: Bug
>  Components: Windows
>Affects Versions: 1.2.0
>Reporter: Kousuke Saruta
>
> Windows platform expects the EOL characters are CRLF but in *.cmd except for 
> compute-classpath.cmd, the EOL characters are LF.
> To avoid unexpected problem, we should replace LF with CRLF in *.cmd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3603) InvalidClassException on a Linux VM - probably problem with serialization

2014-10-01 Thread Richard Cross (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154600#comment-14154600
 ] 

Richard Cross commented on SPARK-3603:
--

Hi, I'm working with Tomasz on the same project.  We have 3 ostensibly 
identical Linux Servers (same OS/Kernel, same Java, same version of Spark).  

Each one is a completely independent testing environment, running Spark 1.0.0 
in standalone mode with one Spark Master and one Worker.

The problem is that our application works on 1 machine, and fails with the 
above error on the other 2... and we cannot find out what is different about 
those 2 machines that would cause this error.  We think we have ruled out 
Endian-ness, as all three machines are Little-Endian.

> InvalidClassException on a Linux VM - probably problem with serialization
> -
>
> Key: SPARK-3603
> URL: https://issues.apache.org/jira/browse/SPARK-3603
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.1.0
> Environment: Linux version 2.6.32-358.32.3.el6.x86_64 
> (mockbu...@x86-029.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
> Hat 4.4.7-3) (GCC) ) #1 SMP Fri Jan 17 08:42:31 EST 2014
> java version "1.7.0_25"
> OpenJDK Runtime Environment (rhel-2.3.10.4.el6_4-x86_64)
> OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
> Spark (either 1.0.0 or 1.1.0)
>Reporter: Tomasz Dudziak
>Priority: Critical
>  Labels: scala, serialization, spark
>
> I have a Scala app connecting to a standalone Spark cluster. It works fine on 
> Windows or on a Linux VM; however, when I try to run the app and the Spark 
> cluster on another Linux VM (the same Linux kernel, Java and Spark - tested 
> for versions 1.0.0 and 1.1.0) I get the below exception. This looks kind of 
> similar to the Big-Endian (IBM Power7) Spark Serialization issue 
> (SPARK-2018), but... my system is definitely little endian and I understand 
> the big endian issue should be already fixed in Spark 1.1.0 anyway. I'd 
> appreaciate your help.
> 01:34:53.251 WARN  [Result resolver thread-0][TaskSetManager] Lost TID 2 
> (task 1.0:2)
> 01:34:53.278 WARN  [Result resolver thread-0][TaskSetManager] Loss was due to 
> java.io.InvalidClassException
> java.io.InvalidClassException: scala.reflect.ClassTag$$anon$1; local class 
> incompatible: stream classdesc serialVersionUID = -4937928798201944954, local 
> class serialVersionUID = -8102093212602380348
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
> at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1769)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadF

[jira] [Commented] (SPARK-3757) mvn clean doesn't delete some files

2014-10-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154609#comment-14154609
 ] 

Apache Spark commented on SPARK-3757:
-

User 'tsudukim' has created a pull request for this issue:
https://github.com/apache/spark/pull/2613

> mvn clean doesn't delete some files
> ---
>
> Key: SPARK-3757
> URL: https://issues.apache.org/jira/browse/SPARK-3757
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: Masayoshi TSUZUKI
>Priority: Trivial
>
> When we build spak using {{mvn package}},
> {{/python/lib/py4j-0.8.2.1-src.zip}} is unzipped and some py4j files (*.py) 
> are created into {{/python/build/}}.
> But this directory and these files are not deleted while {{mvn clean}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3759) SparkSubmitDriverBootstrapper should return exit code of driver process

2014-10-01 Thread Eric Eijkelenboom (JIRA)
Eric Eijkelenboom created SPARK-3759:


 Summary: SparkSubmitDriverBootstrapper should return exit code of 
driver process
 Key: SPARK-3759
 URL: https://issues.apache.org/jira/browse/SPARK-3759
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.1.0
 Environment: Linux, Windows, Scala/Java
Reporter: Eric Eijkelenboom
Priority: Minor


SparkSubmitDriverBootstrapper.scala currently always returns exit code 0. 
Instead, it should return the exit code of the driver process.

Suggested code change in SparkSubmitDriverBootstrapper, line 157: 

{code}
val returnCode = process.waitFor()
sys.exit(returnCode)
{code}

Workaround for this issue: 
Instead of specifying 'driver.extra*' properties in spark-defaults.conf, pass 
these properties to spark-submit directly. This will launch the driver program 
without the use of SparkSubmitDriverBootstrapper. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1812) Support cross-building with Scala 2.11

2014-10-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154653#comment-14154653
 ] 

Apache Spark commented on SPARK-1812:
-

User 'ScrapCodes' has created a pull request for this issue:
https://github.com/apache/spark/pull/2615

> Support cross-building with Scala 2.11
> --
>
> Key: SPARK-1812
> URL: https://issues.apache.org/jira/browse/SPARK-1812
> Project: Spark
>  Issue Type: New Feature
>  Components: Build, Spark Core
>Reporter: Matei Zaharia
>Assignee: Prashant Sharma
>
> Since Scala 2.10/2.11 are source compatible, we should be able to cross build 
> for both versions. From what I understand there are basically three things we 
> need to figure out:
> 1. Have a two versions of our dependency graph, one that uses 2.11 
> dependencies and the other that uses 2.10 dependencies.
> 2. Figure out how to publish different poms for 2.10 and 2.11.
> I think (1) can be accomplished by having a scala 2.11 profile. (2) isn't 
> really well supported by Maven since published pom's aren't generated 
> dynamically. But we can probably script around it to make it work. I've done 
> some initial sanity checks with a simple build here:
> https://github.com/pwendell/scala-maven-crossbuild



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3232) Backport SPARK-3006 into branch-1.0

2014-10-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154737#comment-14154737
 ] 

François Garillot commented on SPARK-3232:
--

This is closed by 
https://github.com/apache/spark/commit/8dd7690e2b4c3269d2777d3e208903bf596d1509

> Backport SPARK-3006 into branch-1.0
> ---
>
> Key: SPARK-3232
> URL: https://issues.apache.org/jira/browse/SPARK-3232
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Windows
>Affects Versions: 1.0.2
> Environment: Windows
>Reporter: Kousuke Saruta
>Priority: Blocker
>
> As well as SPARK-3216, we need to backport SPARK-3006 into branch-1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3007) Add "Dynamic Partition" support to Spark Sql hive

2014-10-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154776#comment-14154776
 ] 

Apache Spark commented on SPARK-3007:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/2616

> Add "Dynamic Partition" support  to  Spark Sql hive
> ---
>
> Key: SPARK-3007
> URL: https://issues.apache.org/jira/browse/SPARK-3007
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: baishuo
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3593) Support Sorting of Binary Type Data

2014-10-01 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154863#comment-14154863
 ] 

Venkata Ramana G commented on SPARK-3593:
-

BinaryType is currently not derived under NativeType and does not have Ordering 
Support.
So BinaryType can be moved under NativeType as it already has JvmType defined, 
and required to implement Ordering. 
Hive also identifies BinaryType these types under Primitive Types keeping other 
complex types like Arrays,Maps,Structs and union as Complex Types.
This is similar to current TimestampType handling.

> Support Sorting of Binary Type Data
> ---
>
> Key: SPARK-3593
> URL: https://issues.apache.org/jira/browse/SPARK-3593
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Paul Magid
>
> If you try sorting on a binary field you currently get an exception.   Please 
> add support for binary data type sorting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-10-01 Thread Yi Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154893#comment-14154893
 ] 

Yi Tian commented on SPARK-3687:


yes

> Spark hang while processing more than 100 sequence files
> 
>
> Key: SPARK-3687
> URL: https://issues.apache.org/jira/browse/SPARK-3687
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Ziv Huang
>
> In my application, I read more than 100 sequence files to a JavaPairRDD, 
> perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
> result.
> It is quite often (but not always) that the spark hangs while the executing 
> some of 120th-150th tasks.
> In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for 
> its completion).
> When the spark job hangs,  I can't kill the job from web UI.
> In 1.1.0, the job hangs for couple mins (3.x mins actually),
> and then web UI of spark master shows that the job is finished with state 
> "FAILED".
> In addition, the job stage web UI still hangs, and execution duration time is 
> still accumulating.
> For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere.
> The current workaround is to use coalesce to reduce the number of partitions 
> to be processed.
> I never get a job hanged if the number of partitions to be processed is no 
> greater than 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3593) Support Sorting of Binary Type Data

2014-10-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154897#comment-14154897
 ] 

Apache Spark commented on SPARK-3593:
-

User 'gvramana' has created a pull request for this issue:
https://github.com/apache/spark/pull/2617

> Support Sorting of Binary Type Data
> ---
>
> Key: SPARK-3593
> URL: https://issues.apache.org/jira/browse/SPARK-3593
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Paul Magid
>
> If you try sorting on a binary field you currently get an exception.   Please 
> add support for binary data type sorting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3760) Add Twitter4j FilterQuery to spark streaming twitter API

2014-10-01 Thread Eugene Zhulenev (JIRA)
Eugene Zhulenev created SPARK-3760:
--

 Summary: Add Twitter4j FilterQuery to spark streaming twitter API
 Key: SPARK-3760
 URL: https://issues.apache.org/jira/browse/SPARK-3760
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.1.0
Reporter: Eugene Zhulenev
Priority: Minor


TwitterUtils.createStream(...) allows users to specify keywords that restrict 
the tweets that are returned. However FilterQuery from Twitter4j has a bunch of 
other options including location that was asked in SPARK-2788. Best solution 
will be add alternative createStream method with FilterQuery as argument 
instead of keywords.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3760) Add Twitter4j FilterQuery to spark streaming twitter API

2014-10-01 Thread Eugene Zhulenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Zhulenev updated SPARK-3760:
---
Description: 
TwitterUtils.createStream(...) allows users to specify keywords that restrict 
the tweets that are returned. However FilterQuery from Twitter4j has a bunch of 
other options including location that was asked in SPARK-2788. Best solution 
will be add alternative createStream method with FilterQuery as argument 
instead of keywords.

Pull Request: https://github.com/apache/spark/pull/2618

  was:TwitterUtils.createStream(...) allows users to specify keywords that 
restrict the tweets that are returned. However FilterQuery from Twitter4j has a 
bunch of other options including location that was asked in SPARK-2788. Best 
solution will be add alternative createStream method with FilterQuery as 
argument instead of keywords.


> Add Twitter4j FilterQuery to spark streaming twitter API
> 
>
> Key: SPARK-3760
> URL: https://issues.apache.org/jira/browse/SPARK-3760
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: Eugene Zhulenev
>Priority: Minor
>
> TwitterUtils.createStream(...) allows users to specify keywords that restrict 
> the tweets that are returned. However FilterQuery from Twitter4j has a bunch 
> of other options including location that was asked in SPARK-2788. Best 
> solution will be add alternative createStream method with FilterQuery as 
> argument instead of keywords.
> Pull Request: https://github.com/apache/spark/pull/2618



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3760) Add Twitter4j FilterQuery to spark streaming twitter API

2014-10-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154917#comment-14154917
 ] 

Apache Spark commented on SPARK-3760:
-

User 'ezhulenev' has created a pull request for this issue:
https://github.com/apache/spark/pull/2618

> Add Twitter4j FilterQuery to spark streaming twitter API
> 
>
> Key: SPARK-3760
> URL: https://issues.apache.org/jira/browse/SPARK-3760
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: Eugene Zhulenev
>Priority: Minor
>
> TwitterUtils.createStream(...) allows users to specify keywords that restrict 
> the tweets that are returned. However FilterQuery from Twitter4j has a bunch 
> of other options including location that was asked in SPARK-2788. Best 
> solution will be add alternative createStream method with FilterQuery as 
> argument instead of keywords.
> Pull Request: https://github.com/apache/spark/pull/2618



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3761) Class not found exception / sbt 13.5 / Scala 2.10.4

2014-10-01 Thread Igor Tkachenko (JIRA)
Igor Tkachenko created SPARK-3761:
-

 Summary: Class not found exception / sbt 13.5 / Scala 2.10.4
 Key: SPARK-3761
 URL: https://issues.apache.org/jira/browse/SPARK-3761
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Igor Tkachenko


I have Scala code:

val master = "spark://:7077"

val sc = new SparkContext(new SparkConf()
  .setMaster(master)
  .setAppName("SparkQueryDemo 01")
  .set("spark.executor.memory", "512m"))

val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
  .filter(line  => line.contains("Barclays"))
  .count()

I've got such an error:
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0.0:0 failed 4 times, most
recent failure: Exception failure in TID 6 on host : 
java.lang.ClassNotFoundExcept
ion: SimpleApp$$anonfun$1

My dependencies :

object Version {
  val spark= "1.0.0-cdh5.1.0"
  val hadoop   = "2.4.1"
  val slf4j= "1.7.6"
  val logback  = "1.1.1"
  val scalaTest= "2.1.0"
  val mockito  = "1.9.5"
}

object Library {
  val sparkCore  = "org.apache.spark"  %% "spark-assembly"  % Version.spark
  val hadoopClient   = "org.apache.hadoop" %  "hadoop-client"   % Version.hadoop
  val slf4jApi   = "org.slf4j" %  "slf4j-api"   % Version.slf4j
  val logbackClassic = "ch.qos.logback"%  "logback-classic" % 
Version.logback
  val scalaTest  = "org.scalatest" %% "scalatest"   % 
Version.scalaTest
  val mockitoAll = "org.mockito"   %  "mockito-all" % 
Version.mockito
}

My OS is Win 7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3761) Class not found exception / sbt 13.5 / Scala 2.10.4

2014-10-01 Thread Igor Tkachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Tkachenko updated SPARK-3761:
--
Description: 
I have Scala code:

val master = "spark://:7077"

val sc = new SparkContext(new SparkConf()
  .setMaster(master)
  .setAppName("SparkQueryDemo 01")
  .set("spark.executor.memory", "512m"))

val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
  .filter(line  => line.contains("Word"))
  .count()

I've got such an error:
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0.0:0 failed 4 times, most
recent failure: Exception failure in TID 6 on host : 
java.lang.ClassNotFoundExcept
ion: SimpleApp$$anonfun$1

My dependencies :

object Version {
  val spark= "1.0.0-cdh5.1.0"
  val hadoop   = "2.4.1"
  val slf4j= "1.7.6"
  val logback  = "1.1.1"
  val scalaTest= "2.1.0"
  val mockito  = "1.9.5"
}

object Library {
  val sparkCore  = "org.apache.spark"  %% "spark-assembly"  % Version.spark
  val hadoopClient   = "org.apache.hadoop" %  "hadoop-client"   % Version.hadoop
  val slf4jApi   = "org.slf4j" %  "slf4j-api"   % Version.slf4j
  val logbackClassic = "ch.qos.logback"%  "logback-classic" % 
Version.logback
  val scalaTest  = "org.scalatest" %% "scalatest"   % 
Version.scalaTest
  val mockitoAll = "org.mockito"   %  "mockito-all" % 
Version.mockito
}

My OS is Win 7

  was:
I have Scala code:

val master = "spark://:7077"

val sc = new SparkContext(new SparkConf()
  .setMaster(master)
  .setAppName("SparkQueryDemo 01")
  .set("spark.executor.memory", "512m"))

val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
  .filter(line  => line.contains("Barclays"))
  .count()

I've got such an error:
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0.0:0 failed 4 times, most
recent failure: Exception failure in TID 6 on host : 
java.lang.ClassNotFoundExcept
ion: SimpleApp$$anonfun$1

My dependencies :

object Version {
  val spark= "1.0.0-cdh5.1.0"
  val hadoop   = "2.4.1"
  val slf4j= "1.7.6"
  val logback  = "1.1.1"
  val scalaTest= "2.1.0"
  val mockito  = "1.9.5"
}

object Library {
  val sparkCore  = "org.apache.spark"  %% "spark-assembly"  % Version.spark
  val hadoopClient   = "org.apache.hadoop" %  "hadoop-client"   % Version.hadoop
  val slf4jApi   = "org.slf4j" %  "slf4j-api"   % Version.slf4j
  val logbackClassic = "ch.qos.logback"%  "logback-classic" % 
Version.logback
  val scalaTest  = "org.scalatest" %% "scalatest"   % 
Version.scalaTest
  val mockitoAll = "org.mockito"   %  "mockito-all" % 
Version.mockito
}

My OS is Win 7


> Class not found exception / sbt 13.5 / Scala 2.10.4
> ---
>
> Key: SPARK-3761
> URL: https://issues.apache.org/jira/browse/SPARK-3761
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Igor Tkachenko
>
> I have Scala code:
> val master = "spark://:7077"
> val sc = new SparkContext(new SparkConf()
>   .setMaster(master)
>   .setAppName("SparkQueryDemo 01")
>   .set("spark.executor.memory", "512m"))
> val count2 = sc .textFile("hdfs:// address>:8020/tmp/data/risk/account.txt")
>   .filter(line  => line.contains("Word"))
>   .count()
> I've got such an error:
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0:0 failed 4 times, most
> recent failure: Exception failure in TID 6 on host : 
> java.lang.ClassNotFoundExcept
> ion: SimpleApp$$anonfun$1
> My dependencies :
> object Version {
>   val spark= "1.0.0-cdh5.1.0"
>   val hadoop   = "2.4.1"
>   val slf4j= "1.7.6"
>   val logback  = "1.1.1"
>   val scalaTest= "2.1.0"
>   val mockito  = "1.9.5"
> }
> object Library {
>   val sparkCore  = "org.apache.spark"  %% "spark-assembly"  % 
> Version.spark
>   val hadoopClient   = "org.apache.hadoop" %  "hadoop-client"   % 
> Version.hadoop
>   val slf4jApi   = "org.slf4j" %  "slf4j-api"   % 
> Version.slf4j
>   val logbackClassic = "ch.qos.logback"%  "logback-classic" % 
> Version.logback
>   val scalaTest  = "org.scalatest" %% "scalatest"   % 
> Version.scalaTest
>   val mockitoAll = "org.mockito"   %  "mockito-all" % 
> Version.mockito
> }
> My OS is Win 7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3757) mvn clean doesn't delete some files

2014-10-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3757.

   Resolution: Fixed
Fix Version/s: 1.2.0

Resolved by:
https://github.com/apache/spark/pull/2613

> mvn clean doesn't delete some files
> ---
>
> Key: SPARK-3757
> URL: https://issues.apache.org/jira/browse/SPARK-3757
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: Masayoshi TSUZUKI
>Priority: Trivial
> Fix For: 1.2.0
>
>
> When we build spak using {{mvn package}},
> {{/python/lib/py4j-0.8.2.1-src.zip}} is unzipped and some py4j files (*.py) 
> are created into {{/python/build/}}.
> But this directory and these files are not deleted while {{mvn clean}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3761) Class not found exception / sbt 13.5 / Scala 2.10.4

2014-10-01 Thread Igor Tkachenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154982#comment-14154982
 ] 

Igor Tkachenko commented on SPARK-3761:
---

if I change code to:
val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
//.filter(line => line.contains("Word")) it seems it can't work with 
anonymous function
.count()

it does work!

> Class not found exception / sbt 13.5 / Scala 2.10.4
> ---
>
> Key: SPARK-3761
> URL: https://issues.apache.org/jira/browse/SPARK-3761
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Igor Tkachenko
>
> I have Scala code:
> val master = "spark://:7077"
> val sc = new SparkContext(new SparkConf()
>   .setMaster(master)
>   .setAppName("SparkQueryDemo 01")
>   .set("spark.executor.memory", "512m"))
> val count2 = sc .textFile("hdfs:// address>:8020/tmp/data/risk/account.txt")
>   .filter(line  => line.contains("Word"))
>   .count()
> I've got such an error:
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0:0 failed 4 times, most
> recent failure: Exception failure in TID 6 on host : 
> java.lang.ClassNotFoundExcept
> ion: SimpleApp$$anonfun$1
> My dependencies :
> object Version {
>   val spark= "1.0.0-cdh5.1.0"
>   val hadoop   = "2.4.1"
>   val slf4j= "1.7.6"
>   val logback  = "1.1.1"
>   val scalaTest= "2.1.0"
>   val mockito  = "1.9.5"
> }
> object Library {
>   val sparkCore  = "org.apache.spark"  %% "spark-assembly"  % 
> Version.spark
>   val hadoopClient   = "org.apache.hadoop" %  "hadoop-client"   % 
> Version.hadoop
>   val slf4jApi   = "org.slf4j" %  "slf4j-api"   % 
> Version.slf4j
>   val logbackClassic = "ch.qos.logback"%  "logback-classic" % 
> Version.logback
>   val scalaTest  = "org.scalatest" %% "scalatest"   % 
> Version.scalaTest
>   val mockitoAll = "org.mockito"   %  "mockito-all" % 
> Version.mockito
> }
> My OS is Win 7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3761) Class not found exception / sbt 13.5 / Scala 2.10.4

2014-10-01 Thread Igor Tkachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Tkachenko updated SPARK-3761:
--
Description: 
I have Scala code:

val master = "spark://:7077"

val sc = new SparkContext(new SparkConf()
  .setMaster(master)
  .setAppName("SparkQueryDemo 01")
  .set("spark.executor.memory", "512m"))

val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
  .filter(line  => line.contains("Word"))
  .count()

I've got such an error:
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0.0:0 failed 4 times, most
recent failure: Exception failure in TID 6 on host : 
java.lang.ClassNotFoundExcept
ion: SimpleApp$$anonfun$1

My dependencies :

object Version {
  val spark= "1.0.0-cdh5.1.0"
}

object Library {
  val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
Version.spark
}

My OS is Win 7

  was:
I have Scala code:

val master = "spark://:7077"

val sc = new SparkContext(new SparkConf()
  .setMaster(master)
  .setAppName("SparkQueryDemo 01")
  .set("spark.executor.memory", "512m"))

val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
  .filter(line  => line.contains("Word"))
  .count()

I've got such an error:
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0.0:0 failed 4 times, most
recent failure: Exception failure in TID 6 on host : 
java.lang.ClassNotFoundExcept
ion: SimpleApp$$anonfun$1

My dependencies :

object Version {
  val spark= "1.0.0-cdh5.1.0"
  val hadoop   = "2.4.1"
  val slf4j= "1.7.6"
  val logback  = "1.1.1"
  val scalaTest= "2.1.0"
  val mockito  = "1.9.5"
}

object Library {
  val sparkCore  = "org.apache.spark"  %% "spark-assembly"  % Version.spark
  val hadoopClient   = "org.apache.hadoop" %  "hadoop-client"   % Version.hadoop
  val slf4jApi   = "org.slf4j" %  "slf4j-api"   % Version.slf4j
  val logbackClassic = "ch.qos.logback"%  "logback-classic" % 
Version.logback
  val scalaTest  = "org.scalatest" %% "scalatest"   % 
Version.scalaTest
  val mockitoAll = "org.mockito"   %  "mockito-all" % 
Version.mockito
}

My OS is Win 7


> Class not found exception / sbt 13.5 / Scala 2.10.4
> ---
>
> Key: SPARK-3761
> URL: https://issues.apache.org/jira/browse/SPARK-3761
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Igor Tkachenko
>
> I have Scala code:
> val master = "spark://:7077"
> val sc = new SparkContext(new SparkConf()
>   .setMaster(master)
>   .setAppName("SparkQueryDemo 01")
>   .set("spark.executor.memory", "512m"))
> val count2 = sc .textFile("hdfs:// address>:8020/tmp/data/risk/account.txt")
>   .filter(line  => line.contains("Word"))
>   .count()
> I've got such an error:
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0:0 failed 4 times, most
> recent failure: Exception failure in TID 6 on host : 
> java.lang.ClassNotFoundExcept
> ion: SimpleApp$$anonfun$1
> My dependencies :
> object Version {
>   val spark= "1.0.0-cdh5.1.0"
> }
> object Library {
>   val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
> Version.spark
> }
> My OS is Win 7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3761) Class anonfun$1 not found exception / sbt 13.5 / Scala 2.10.4

2014-10-01 Thread Igor Tkachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Tkachenko updated SPARK-3761:
--
Summary: Class anonfun$1 not found exception / sbt 13.5 / Scala 2.10.4  
(was: Class not found exception / sbt 13.5 / Scala 2.10.4)

> Class anonfun$1 not found exception / sbt 13.5 / Scala 2.10.4
> -
>
> Key: SPARK-3761
> URL: https://issues.apache.org/jira/browse/SPARK-3761
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Igor Tkachenko
>
> I have Scala code:
> val master = "spark://:7077"
> val sc = new SparkContext(new SparkConf()
>   .setMaster(master)
>   .setAppName("SparkQueryDemo 01")
>   .set("spark.executor.memory", "512m"))
> val count2 = sc .textFile("hdfs:// address>:8020/tmp/data/risk/account.txt")
>   .filter(line  => line.contains("Word"))
>   .count()
> I've got such an error:
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0:0 failed 4 times, most
> recent failure: Exception failure in TID 6 on host : 
> java.lang.ClassNotFoundExcept
> ion: SimpleApp$$anonfun$1
> My dependencies :
> object Version {
>   val spark= "1.0.0-cdh5.1.0"
> }
> object Library {
>   val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
> Version.spark
> }
> My OS is Win 7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3761) Class anonfun$1 not found exception / sbt 13.5 / Scala 2.10.4

2014-10-01 Thread Igor Tkachenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154982#comment-14154982
 ] 

Igor Tkachenko edited comment on SPARK-3761 at 10/1/14 4:28 PM:


The code in the description does not work under sbt 13.6 too

if I change code to:
val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
//.filter(line => line.contains("Word")) it seems it can't work with 
anonymous function
.count()

it does work!


was (Author: legart):
if I change code to:
val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
//.filter(line => line.contains("Word")) it seems it can't work with 
anonymous function
.count()

it does work!

> Class anonfun$1 not found exception / sbt 13.5 / Scala 2.10.4
> -
>
> Key: SPARK-3761
> URL: https://issues.apache.org/jira/browse/SPARK-3761
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Igor Tkachenko
>
> I have Scala code:
> val master = "spark://:7077"
> val sc = new SparkContext(new SparkConf()
>   .setMaster(master)
>   .setAppName("SparkQueryDemo 01")
>   .set("spark.executor.memory", "512m"))
> val count2 = sc .textFile("hdfs:// address>:8020/tmp/data/risk/account.txt")
>   .filter(line  => line.contains("Word"))
>   .count()
> I've got such an error:
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0:0 failed 4 times, most
> recent failure: Exception failure in TID 6 on host : 
> java.lang.ClassNotFoundExcept
> ion: SimpleApp$$anonfun$1
> My dependencies :
> object Version {
>   val spark= "1.0.0-cdh5.1.0"
> }
> object Library {
>   val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
> Version.spark
> }
> My OS is Win 7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3761) Class anonfun$1 not found exception / sbt 13.5 / Scala 2.10.4

2014-10-01 Thread Igor Tkachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Tkachenko updated SPARK-3761:
--
Priority: Blocker  (was: Major)

> Class anonfun$1 not found exception / sbt 13.5 / Scala 2.10.4
> -
>
> Key: SPARK-3761
> URL: https://issues.apache.org/jira/browse/SPARK-3761
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Igor Tkachenko
>Priority: Blocker
>
> I have Scala code:
> val master = "spark://:7077"
> val sc = new SparkContext(new SparkConf()
>   .setMaster(master)
>   .setAppName("SparkQueryDemo 01")
>   .set("spark.executor.memory", "512m"))
> val count2 = sc .textFile("hdfs:// address>:8020/tmp/data/risk/account.txt")
>   .filter(line  => line.contains("Word"))
>   .count()
> I've got such an error:
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0:0 failed 4 times, most
> recent failure: Exception failure in TID 6 on host : 
> java.lang.ClassNotFoundExcept
> ion: SimpleApp$$anonfun$1
> My dependencies :
> object Version {
>   val spark= "1.0.0-cdh5.1.0"
> }
> object Library {
>   val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
> Version.spark
> }
> My OS is Win 7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3761) Class anonfun$1 not found exception / sbt 13.x / Scala 2.10.4

2014-10-01 Thread Igor Tkachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Tkachenko updated SPARK-3761:
--
Summary: Class anonfun$1 not found exception / sbt 13.x / Scala 2.10.4  
(was: Class anonfun$1 not found exception / sbt 13.5 / Scala 2.10.4)

> Class anonfun$1 not found exception / sbt 13.x / Scala 2.10.4
> -
>
> Key: SPARK-3761
> URL: https://issues.apache.org/jira/browse/SPARK-3761
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Igor Tkachenko
>Priority: Blocker
>
> I have Scala code:
> val master = "spark://:7077"
> val sc = new SparkContext(new SparkConf()
>   .setMaster(master)
>   .setAppName("SparkQueryDemo 01")
>   .set("spark.executor.memory", "512m"))
> val count2 = sc .textFile("hdfs:// address>:8020/tmp/data/risk/account.txt")
>   .filter(line  => line.contains("Word"))
>   .count()
> I've got such an error:
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0:0 failed 4 times, most
> recent failure: Exception failure in TID 6 on host : 
> java.lang.ClassNotFoundExcept
> ion: SimpleApp$$anonfun$1
> My dependencies :
> object Version {
>   val spark= "1.0.0-cdh5.1.0"
> }
> object Library {
>   val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
> Version.spark
> }
> My OS is Win 7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3761) Class anonfun$1 not found exception / sbt 13.x / Scala 2.10.4

2014-10-01 Thread Igor Tkachenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154982#comment-14154982
 ] 

Igor Tkachenko edited comment on SPARK-3761 at 10/1/14 4:31 PM:


The code in the description does not work under sbt 13.6 too

if I change code to:
val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
//.filter(line => line.contains("Word")) it seems it can't work with 
anonymous function
.count()

Without filter function with lambda inside, it does work!


was (Author: legart):
The code in the description does not work under sbt 13.6 too

if I change code to:
val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
//.filter(line => line.contains("Word")) it seems it can't work with 
anonymous function
.count()

it does work!

> Class anonfun$1 not found exception / sbt 13.x / Scala 2.10.4
> -
>
> Key: SPARK-3761
> URL: https://issues.apache.org/jira/browse/SPARK-3761
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Igor Tkachenko
>Priority: Blocker
>
> I have Scala code:
> val master = "spark://:7077"
> val sc = new SparkContext(new SparkConf()
>   .setMaster(master)
>   .setAppName("SparkQueryDemo 01")
>   .set("spark.executor.memory", "512m"))
> val count2 = sc .textFile("hdfs:// address>:8020/tmp/data/risk/account.txt")
>   .filter(line  => line.contains("Word"))
>   .count()
> I've got such an error:
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0:0 failed 4 times, most
> recent failure: Exception failure in TID 6 on host : 
> java.lang.ClassNotFoundExcept
> ion: SimpleApp$$anonfun$1
> My dependencies :
> object Version {
>   val spark= "1.0.0-cdh5.1.0"
> }
> object Library {
>   val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
> Version.spark
> }
> My OS is Win 7, sbt 13.5, Scala 2.10.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3761) Class anonfun$1 not found exception / sbt 13.x / Scala 2.10.4

2014-10-01 Thread Igor Tkachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Tkachenko updated SPARK-3761:
--
Description: 
I have Scala code:

val master = "spark://:7077"

val sc = new SparkContext(new SparkConf()
  .setMaster(master)
  .setAppName("SparkQueryDemo 01")
  .set("spark.executor.memory", "512m"))

val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
  .filter(line  => line.contains("Word"))
  .count()

I've got such an error:
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0.0:0 failed 4 times, most
recent failure: Exception failure in TID 6 on host : 
java.lang.ClassNotFoundExcept
ion: SimpleApp$$anonfun$1

My dependencies :

object Version {
  val spark= "1.0.0-cdh5.1.0"
}

object Library {
  val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
Version.spark
}

My OS is Win 7, sbt 13.5, Scala 2.10.4

  was:
I have Scala code:

val master = "spark://:7077"

val sc = new SparkContext(new SparkConf()
  .setMaster(master)
  .setAppName("SparkQueryDemo 01")
  .set("spark.executor.memory", "512m"))

val count2 = sc .textFile("hdfs://:8020/tmp/data/risk/account.txt")
  .filter(line  => line.contains("Word"))
  .count()

I've got such an error:
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0.0:0 failed 4 times, most
recent failure: Exception failure in TID 6 on host : 
java.lang.ClassNotFoundExcept
ion: SimpleApp$$anonfun$1

My dependencies :

object Version {
  val spark= "1.0.0-cdh5.1.0"
}

object Library {
  val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
Version.spark
}

My OS is Win 7


> Class anonfun$1 not found exception / sbt 13.x / Scala 2.10.4
> -
>
> Key: SPARK-3761
> URL: https://issues.apache.org/jira/browse/SPARK-3761
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Igor Tkachenko
>Priority: Blocker
>
> I have Scala code:
> val master = "spark://:7077"
> val sc = new SparkContext(new SparkConf()
>   .setMaster(master)
>   .setAppName("SparkQueryDemo 01")
>   .set("spark.executor.memory", "512m"))
> val count2 = sc .textFile("hdfs:// address>:8020/tmp/data/risk/account.txt")
>   .filter(line  => line.contains("Word"))
>   .count()
> I've got such an error:
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0:0 failed 4 times, most
> recent failure: Exception failure in TID 6 on host : 
> java.lang.ClassNotFoundExcept
> ion: SimpleApp$$anonfun$1
> My dependencies :
> object Version {
>   val spark= "1.0.0-cdh5.1.0"
> }
> object Library {
>   val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
> Version.spark
> }
> My OS is Win 7, sbt 13.5, Scala 2.10.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-10-01 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155097#comment-14155097
 ] 

Matt Cheah commented on SPARK-1860:
---

This might be a silly question, but are we guaranteed that the application 
folder will always be labeled by appid? I looked at ExecutorRunner and it 
certainly generates the folder by application ID and executor ID, but code 
comments in ExecutorRunner indicate it is only used by the standalone cluster 
mode. Hence I didn't tie any logic to the actual naming of the folders.

> Standalone Worker cleanup should not clean up running executors
> ---
>
> Key: SPARK-1860
> URL: https://issues.apache.org/jira/browse/SPARK-1860
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Priority: Blocker
>
> The default values of the standalone worker cleanup code cleanup all 
> application data every 7 days. This includes jars that were added to any 
> executors that happen to be running for longer than 7 days, hitting streaming 
> jobs especially hard.
> Executor's log/data folders should not be cleaned up if they're still 
> running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3706) Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset

2014-10-01 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155219#comment-14155219
 ] 

Josh Rosen commented on SPARK-3706:
---

Do you know whether this problem occurs in the released version of 1.1.0 or in 
branch-1.1?  It looks like you tested this with 
b235e013638685758885842dc3268e9800af3678, which is part of the master branch 
(1.2.0).

I tried {{IPYTHON=1 ./bin/pyspark}} with one of the 1.1.0 binary distributions 
and it worked as expected.  Therefore, I'm going to change the "Affects 
Versions" to 1.2.0.  Let me know if you can reproduce this issue on 1.1.0, 
since we should backport a fix if this represents a regression between 1.0.2 
and 1.1.0.

> Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset
> 
>
> Key: SPARK-3706
> URL: https://issues.apache.org/jira/browse/SPARK-3706
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0
>Reporter: cocoatomo
>  Labels: pyspark
>
> h3. Problem
> The section "Using the shell" in Spark Programming Guide 
> (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) 
> says that we can run pyspark REPL through IPython.
> But a folloing command does not run IPython but a default Python executable.
> {quote}
> $ IPYTHON=1 ./bin/pyspark
> Python 2.7.8 (default, Jul  2 2014, 10:14:46) 
> ...
> {quote}
> the spark/bin/pyspark script on the commit 
> b235e013638685758885842dc3268e9800af3678 decides which executable and options 
> it use folloing way.
> # if PYSPARK_PYTHON unset
> #* → defaulting to "python"
> # if IPYTHON_OPTS set
> #* → set IPYTHON "1"
> # some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit
> #* out of this issues scope
> # if IPYTHON set as "1"
> #* → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS
> #* otherwise execute $PYSPARK_PYTHON
> Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is 
> "1".
> In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no 
> effect on decide which command to use.
> ||PYSPARK_PYTHON||IPYTHON_OPTS||IPYTHON||resulting command||expected command||
> |(unset → defaults to python)|(unset)|(unset)|python|(same)|
> |(unset → defaults to python)|(unset)|1|python|ipython|
> |(unset → defaults to python)|an_option|(unset → set to 1)|python 
> an_option|ipython an_option|
> |(unset → defaults to python)|an_option|1|python an_option|ipython an_option|
> |ipython|(unset)|(unset)|ipython|(same)|
> |ipython|(unset)|1|ipython|(same)|
> |ipython|an_option|(unset → set to 1)|ipython an_option|(same)|
> |ipython|an_option|1|ipython an_option|(same)|
> h3. Suggestion
> The pyspark script should determine firstly whether a user wants to run 
> IPython or other executables.
> # if IPYTHON_OPTS set
> #* set IPYTHON "1"
> # if IPYTHON has a value "1"
> #* PYSPARK_PYTHON defaults to "ipython" if not set
> # PYSPARK_PYTHON defaults to "python" if not set
> See the pull request for more detailed modification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3706) Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset

2014-10-01 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-3706:
--
Affects Version/s: (was: 1.1.0)
   1.2.0

> Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset
> 
>
> Key: SPARK-3706
> URL: https://issues.apache.org/jira/browse/SPARK-3706
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0
>Reporter: cocoatomo
>  Labels: pyspark
>
> h3. Problem
> The section "Using the shell" in Spark Programming Guide 
> (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) 
> says that we can run pyspark REPL through IPython.
> But a folloing command does not run IPython but a default Python executable.
> {quote}
> $ IPYTHON=1 ./bin/pyspark
> Python 2.7.8 (default, Jul  2 2014, 10:14:46) 
> ...
> {quote}
> the spark/bin/pyspark script on the commit 
> b235e013638685758885842dc3268e9800af3678 decides which executable and options 
> it use folloing way.
> # if PYSPARK_PYTHON unset
> #* → defaulting to "python"
> # if IPYTHON_OPTS set
> #* → set IPYTHON "1"
> # some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit
> #* out of this issues scope
> # if IPYTHON set as "1"
> #* → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS
> #* otherwise execute $PYSPARK_PYTHON
> Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is 
> "1".
> In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no 
> effect on decide which command to use.
> ||PYSPARK_PYTHON||IPYTHON_OPTS||IPYTHON||resulting command||expected command||
> |(unset → defaults to python)|(unset)|(unset)|python|(same)|
> |(unset → defaults to python)|(unset)|1|python|ipython|
> |(unset → defaults to python)|an_option|(unset → set to 1)|python 
> an_option|ipython an_option|
> |(unset → defaults to python)|an_option|1|python an_option|ipython an_option|
> |ipython|(unset)|(unset)|ipython|(same)|
> |ipython|(unset)|1|ipython|(same)|
> |ipython|an_option|(unset → set to 1)|ipython an_option|(same)|
> |ipython|an_option|1|ipython an_option|(same)|
> h3. Suggestion
> The pyspark script should determine firstly whether a user wants to run 
> IPython or other executables.
> # if IPYTHON_OPTS set
> #* set IPYTHON "1"
> # if IPYTHON has a value "1"
> #* PYSPARK_PYTHON defaults to "ipython" if not set
> # PYSPARK_PYTHON defaults to "python" if not set
> See the pull request for more detailed modification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3749) Bugs in broadcast of large RDD

2014-10-01 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-3749:
--
Affects Version/s: 1.2.0

> Bugs in broadcast  of large RDD
> ---
>
> Key: SPARK-3749
> URL: https://issues.apache.org/jira/browse/SPARK-3749
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Critical
>
> 1. broadcast is triggle unexpected
> 2. fd is leaked in JVM
> 3. broadcast is not unpersisted in JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3749) Bugs in broadcast of large RDD

2014-10-01 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3749.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2603
[https://github.com/apache/spark/pull/2603]

> Bugs in broadcast  of large RDD
> ---
>
> Key: SPARK-3749
> URL: https://issues.apache.org/jira/browse/SPARK-3749
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Critical
> Fix For: 1.2.0
>
>
> 1. broadcast is triggle unexpected
> 2. fd is leaked in JVM
> 3. broadcast is not unpersisted in JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2626) Stop SparkContext in all examples

2014-10-01 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-2626.
---
Resolution: Fixed

Issue resolved by pull request 2575
[https://github.com/apache/spark/pull/2575]

> Stop SparkContext in all examples
> -
>
> Key: SPARK-2626
> URL: https://issues.apache.org/jira/browse/SPARK-2626
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.1
>Reporter: Andrew Or
>  Labels: starter
> Fix For: 1.2.0
>
>
> Event logs rely on sc.stop() to close the file. If this is never closed, the 
> history server will not be able to find the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3755) Do not bind port 1 - 1024 to server in spark

2014-10-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-3755.

   Resolution: Fixed
Fix Version/s: 1.2.0
 Assignee: wangfei

> Do not bind port 1 - 1024 to server in spark
> 
>
> Key: SPARK-3755
> URL: https://issues.apache.org/jira/browse/SPARK-3755
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: wangfei
>Assignee: wangfei
> Fix For: 1.2.0
>
>
> Non-root user use port 1- 1024 to start jetty server will get the exception " 
> java.net.SocketException: Permission denied"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2626) Stop SparkContext in all examples

2014-10-01 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-2626:
--
Fix Version/s: 1.1.1

> Stop SparkContext in all examples
> -
>
> Key: SPARK-2626
> URL: https://issues.apache.org/jira/browse/SPARK-2626
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.1
>Reporter: Andrew Or
>  Labels: starter
> Fix For: 1.1.1, 1.2.0
>
>
> Event logs rely on sc.stop() to close the file. If this is never closed, the 
> history server will not be able to find the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2626) Stop SparkContext in all examples

2014-10-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-2626:
-
Assignee: Sean Owen

> Stop SparkContext in all examples
> -
>
> Key: SPARK-2626
> URL: https://issues.apache.org/jira/browse/SPARK-2626
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.1
>Reporter: Andrew Or
>Assignee: Sean Owen
>  Labels: starter
> Fix For: 1.1.1, 1.2.0
>
>
> Event logs rely on sc.stop() to close the file. If this is never closed, the 
> history server will not be able to find the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3756) Include possible MultiException when detecting port collisions

2014-10-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3756.

   Resolution: Fixed
Fix Version/s: 1.2.0
   1.1.1
 Assignee: wangfei

https://github.com/apache/spark/pull/2611

> Include possible MultiException when detecting port collisions
> --
>
> Key: SPARK-3756
> URL: https://issues.apache.org/jira/browse/SPARK-3756
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: wangfei
>Assignee: wangfei
> Fix For: 1.1.1, 1.2.0
>
>
> a tiny bug in method  isBindCollision



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3756) Include possible MultiException when detecting port collisions

2014-10-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3756:
---
Summary: Include possible MultiException when detecting port collisions  
(was: check exception is caused by an address-port collision properly)

> Include possible MultiException when detecting port collisions
> --
>
> Key: SPARK-3756
> URL: https://issues.apache.org/jira/browse/SPARK-3756
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: wangfei
> Fix For: 1.1.1, 1.2.0
>
>
> a tiny bug in method  isBindCollision



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-10-01 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155303#comment-14155303
 ] 

Aaron Davidson commented on SPARK-1860:
---

The Worker itself is solely a Standalone mode construct, so AFAIK, this is not 
an issue.

> Standalone Worker cleanup should not clean up running executors
> ---
>
> Key: SPARK-1860
> URL: https://issues.apache.org/jira/browse/SPARK-1860
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Priority: Blocker
>
> The default values of the standalone worker cleanup code cleanup all 
> application data every 7 days. This includes jars that were added to any 
> executors that happen to be running for longer than 7 days, hitting streaming 
> jobs especially hard.
> Executor's log/data folders should not be cleaned up if they're still 
> running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2693) Support for UDAF Hive Aggregates like PERCENTILE

2014-10-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155388#comment-14155388
 ] 

Apache Spark commented on SPARK-2693:
-

User 'ravipesala' has created a pull request for this issue:
https://github.com/apache/spark/pull/2620

> Support for UDAF Hive Aggregates like PERCENTILE
> 
>
> Key: SPARK-2693
> URL: https://issues.apache.org/jira/browse/SPARK-2693
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Ravindra Pesala
>Priority: Critical
>
> {code}
> SELECT MIN(field1), MAX(field2), AVG(field3), PERCENTILE(field4), 
> year,month,day FROM  raw_data_table  GROUP BY year, month, day
> MIN, MAX and AVG functions work fine for me, but with PERCENTILE, I get an 
> error as shown below.
> Exception in thread "main" java.lang.RuntimeException: No handler for udf 
> class org.apache.hadoop.hive.ql.udf.UDAFPercentile
> at scala.sys.package$.error(package.scala:27)
> at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$.lookupFunction(hiveUdfs.scala:69)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:115)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:113)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
> {code}
> This aggregate extends UDAF, which we don't yet have a wrapper for.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-10-01 Thread Chen Song (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155415#comment-14155415
 ] 

Chen Song commented on SPARK-3633:
--

I tried increasing timeout for this property 
spark.core.connection.ack.wait.timeout. I saw less fetch failures due to ack 
timeout but they still exist.

I also tried relaxing the following properties but none of them seems to help.

spark.core.connection.handler.threads.*
spark.core.connection.io.threads.*
spark.core.connection.connect.threads.*




> Fetches failure observed after SPARK-2711
> -
>
> Key: SPARK-3633
> URL: https://issues.apache.org/jira/browse/SPARK-3633
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.1.0
>Reporter: Nishkam Ravi
>
> Running a variant of PageRank on a 6-node cluster with a 30Gb input dataset. 
> Recently upgraded to Spark 1.1. The workload fails with the following error 
> message(s):
> {code}
> 14/09/19 12:10:38 WARN TaskSetManager: Lost task 51.0 in stage 2.1 (TID 552, 
> c1705.halxg.cloudera.com): FetchFailed(BlockManagerId(1, 
> c1706.halxg.cloudera.com, 49612, 0), shuffleId=3, mapId=75, reduceId=120)
> 14/09/19 12:10:38 INFO DAGScheduler: Resubmitting failed stages
> {code}
> In order to identify the problem, I carried out change set analysis. As I go 
> back in time, the error message changes to:
> {code}
> 14/09/21 12:56:54 WARN TaskSetManager: Lost task 35.0 in stage 3.0 (TID 519, 
> c1706.halxg.cloudera.com): java.io.FileNotFoundException: 
> /var/lib/jenkins/workspace/tmp/spark-local-20140921123257-68ee/1c/temp_3a1ade13-b48a-437a-a466-673995304034
>  (Too many open files)
> java.io.FileOutputStream.open(Native Method)
> java.io.FileOutputStream.(FileOutputStream.java:221)
> 
> org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:117)
> 
> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:185)
> 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:197)
> 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:145)
> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
> 
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:51)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {code}
> All the way until Aug 4th. Turns out the problem changeset is 4fde28c. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-10-01 Thread Chen Song (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155415#comment-14155415
 ] 

Chen Song edited comment on SPARK-3633 at 10/1/14 8:15 PM:
---

I tried increasing timeout for this property 
spark.core.connection.ack.wait.timeout. I saw less fetch failures due to ack 
timeout but they still exist.

I also tried relaxing the following properties but none of them seems to help.

spark.core.connection.handler.threads.*
spark.core.connection.io.threads.*
spark.core.connection.connect.threads.*

My job only runs < 500 parallel reduce tasks and I don't see much GC activity 
on sender/receiver executor JVM when the timeout happens.




was (Author: capricornius):
I tried increasing timeout for this property 
spark.core.connection.ack.wait.timeout. I saw less fetch failures due to ack 
timeout but they still exist.

I also tried relaxing the following properties but none of them seems to help.

spark.core.connection.handler.threads.*
spark.core.connection.io.threads.*
spark.core.connection.connect.threads.*




> Fetches failure observed after SPARK-2711
> -
>
> Key: SPARK-3633
> URL: https://issues.apache.org/jira/browse/SPARK-3633
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.1.0
>Reporter: Nishkam Ravi
>
> Running a variant of PageRank on a 6-node cluster with a 30Gb input dataset. 
> Recently upgraded to Spark 1.1. The workload fails with the following error 
> message(s):
> {code}
> 14/09/19 12:10:38 WARN TaskSetManager: Lost task 51.0 in stage 2.1 (TID 552, 
> c1705.halxg.cloudera.com): FetchFailed(BlockManagerId(1, 
> c1706.halxg.cloudera.com, 49612, 0), shuffleId=3, mapId=75, reduceId=120)
> 14/09/19 12:10:38 INFO DAGScheduler: Resubmitting failed stages
> {code}
> In order to identify the problem, I carried out change set analysis. As I go 
> back in time, the error message changes to:
> {code}
> 14/09/21 12:56:54 WARN TaskSetManager: Lost task 35.0 in stage 3.0 (TID 519, 
> c1706.halxg.cloudera.com): java.io.FileNotFoundException: 
> /var/lib/jenkins/workspace/tmp/spark-local-20140921123257-68ee/1c/temp_3a1ade13-b48a-437a-a466-673995304034
>  (Too many open files)
> java.io.FileOutputStream.open(Native Method)
> java.io.FileOutputStream.(FileOutputStream.java:221)
> 
> org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:117)
> 
> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:185)
> 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:197)
> 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:145)
> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
> 
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:51)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {code}
> All the way until Aug 4th. Turns out the problem changeset is 4fde28c. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3752) Spark SQL needs more exhaustive tests for definite Hive UDF's

2014-10-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155527#comment-14155527
 ] 

Apache Spark commented on SPARK-3752:
-

User 'vidaha' has created a pull request for this issue:
https://github.com/apache/spark/pull/2621

> Spark SQL needs more exhaustive tests for definite Hive UDF's
> -
>
> Key: SPARK-3752
> URL: https://issues.apache.org/jira/browse/SPARK-3752
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Vida Ha
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs

2014-10-01 Thread Tathagata Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1410#comment-1410
 ] 

Tathagata Das edited comment on SPARK-3292 at 10/1/14 9:18 PM:
---

I mentioned this in the PR but I am adding it here as well.  Not returning an 
RDD can mess up a lot of the logic and semantics. For example if there is a 
transform() followed by updateStateByKey(), the result will be unpredictable. 
updateStateByKey expects the previous batch to have a state RDD. If it does not 
find any state RDD it will assume that this the start of the streamign 
computation and effectively initialize again, forgetting the previous states 
from 2 batches ago. So this change is incorrect.

Regarding the original problem of creating too many empty files, you can filter 
that out by doing explicitly saving yourself.

dstream.foreachRDD { case (rdd, time) => if (rdd.take(1).size == 1) 
rdd.saveAsHadoopFile() }


was (Author: tdas):
I mentioned this in the PR but I am adding it here as well.  Not returning an 
RDD can mess up a lot of the logic and semantics. For example if there is a 
transform() followed by updateStateByKey(), the result will be unpredictable. 
updateStateByKey expects the previous batch to have a state RDD. If it does not 
find any state RDD it will assume that this the start of the streamign 
computation and effectively initialize again, forgetting the previous states 
from 2 batches ago. So this change is incorrect.

Regarding the original problem of creating too many empty files, you can filter 
that out by doing explicitly saving yourself.

dstream.foreachRDD { case (rdd, time) => 
if (rdd.take(1).size == 1) {
   rdd.saveAsHadoopFile()
   }
}

> Shuffle Tasks run incessantly even though there's no inputs
> ---
>
> Key: SPARK-3292
> URL: https://issues.apache.org/jira/browse/SPARK-3292
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.2
>Reporter: guowei
>
> such as repartition groupby join and cogroup
> for example. 
> if i want the shuffle outputs save as hadoop file ,even though  there is no 
> inputs , many emtpy file generate too.
> it's too expensive , 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs

2014-10-01 Thread Tathagata Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1410#comment-1410
 ] 

Tathagata Das commented on SPARK-3292:
--

I mentioned this in the PR but I am adding it here as well.  Not returning an 
RDD can mess up a lot of the logic and semantics. For example if there is a 
transform() followed by updateStateByKey(), the result will be unpredictable. 
updateStateByKey expects the previous batch to have a state RDD. If it does not 
find any state RDD it will assume that this the start of the streamign 
computation and effectively initialize again, forgetting the previous states 
from 2 batches ago. So this change is incorrect.

Regarding the original problem of creating too many empty files, you can filter 
that out by doing explicitly saving yourself.

dstream.foreachRDD { case (rdd, time) => 
if (rdd.take(1).size == 1) {
   rdd.saveAsHadoopFile()
   }
}

> Shuffle Tasks run incessantly even though there's no inputs
> ---
>
> Key: SPARK-3292
> URL: https://issues.apache.org/jira/browse/SPARK-3292
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.2
>Reporter: guowei
>
> such as repartition groupby join and cogroup
> for example. 
> if i want the shuffle outputs save as hadoop file ,even though  there is no 
> inputs , many emtpy file generate too.
> it's too expensive , 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3746) Failure to lock hive client when creating tables

2014-10-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3746.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2598
[https://github.com/apache/spark/pull/2598]

> Failure to lock hive client when creating tables
> 
>
> Key: SPARK-3746
> URL: https://issues.apache.org/jira/browse/SPARK-3746
> Project: Spark
>  Issue Type: Bug
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3658) Take thrift server as a daemon

2014-10-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3658.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2509
[https://github.com/apache/spark/pull/2509]

> Take thrift server as a daemon
> --
>
> Key: SPARK-3658
> URL: https://issues.apache.org/jira/browse/SPARK-3658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: WangTaoTheTonic
>Priority: Minor
> Fix For: 1.2.0
>
>
> We take thrift server as a daemon working in background, so we can log it 
> while it is running and stop it with scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3708) Backticks aren't handled correctly is aliases

2014-10-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3708.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2594
[https://github.com/apache/spark/pull/2594]

> Backticks aren't handled correctly is aliases
> -
>
> Key: SPARK-3708
> URL: https://issues.apache.org/jira/browse/SPARK-3708
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Michael Armbrust
> Fix For: 1.2.0
>
>
> Here's a failing test case:
> {code}
> sql("SELECT k FROM (SELECT `key` AS `k` FROM src) a")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3705) add case for VoidObjectInspector in inspectorToDataType

2014-10-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3705.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2552
[https://github.com/apache/spark/pull/2552]

> add case for VoidObjectInspector in inspectorToDataType
> ---
>
> Key: SPARK-3705
> URL: https://issues.apache.org/jira/browse/SPARK-3705
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: wangfei
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3593) Support Sorting of Binary Type Data

2014-10-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3593.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2617
[https://github.com/apache/spark/pull/2617]

> Support Sorting of Binary Type Data
> ---
>
> Key: SPARK-3593
> URL: https://issues.apache.org/jira/browse/SPARK-3593
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Paul Magid
> Fix For: 1.2.0
>
>
> If you try sorting on a binary field you currently get an exception.   Please 
> add support for binary data type sorting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-3755) Do not bind port 1 - 1024 to server in spark

2014-10-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-3755:


Original fix was broken so I'm re-opening it.

> Do not bind port 1 - 1024 to server in spark
> 
>
> Key: SPARK-3755
> URL: https://issues.apache.org/jira/browse/SPARK-3755
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: wangfei
>Assignee: wangfei
> Fix For: 1.2.0
>
>
> Non-root user use port 1- 1024 to start jetty server will get the exception " 
> java.net.SocketException: Permission denied"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3755) Do not bind port 1 - 1024 to server in spark

2014-10-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155777#comment-14155777
 ] 

Apache Spark commented on SPARK-3755:
-

User 'scwf' has created a pull request for this issue:
https://github.com/apache/spark/pull/2623

> Do not bind port 1 - 1024 to server in spark
> 
>
> Key: SPARK-3755
> URL: https://issues.apache.org/jira/browse/SPARK-3755
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: wangfei
>Assignee: wangfei
> Fix For: 1.2.0
>
>
> Non-root user use port 1- 1024 to start jetty server will get the exception " 
> java.net.SocketException: Permission denied"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3729) Null-pointer when constructing a HiveContext when settings are present

2014-10-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3729.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2583
[https://github.com/apache/spark/pull/2583]

> Null-pointer when constructing a HiveContext when settings are present
> --
>
> Key: SPARK-3729
> URL: https://issues.apache.org/jira/browse/SPARK-3729
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
> Fix For: 1.2.0
>
>
> {code}
> java.lang.NullPointerException
>   at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:307)
>   at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:270)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:242)
>   at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:79)
>   at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:78)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:78)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:76)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3704) the types not match adding value form spark row to hive row in SparkSQLOperationManager

2014-10-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3704.
-
Resolution: Fixed

Issue resolved by pull request 2551
[https://github.com/apache/spark/pull/2551]

> the types not match adding value form spark row to hive row in 
> SparkSQLOperationManager
> ---
>
> Key: SPARK-3704
> URL: https://issues.apache.org/jira/browse/SPARK-3704
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: wangfei
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs

2014-10-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3292.

Resolution: Won't Fix

Seems like this is a necessary feature of the current design and can be 
partially worked around by filtering in user space.

> Shuffle Tasks run incessantly even though there's no inputs
> ---
>
> Key: SPARK-3292
> URL: https://issues.apache.org/jira/browse/SPARK-3292
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.2
>Reporter: guowei
>
> such as repartition groupby join and cogroup
> for example. 
> if i want the shuffle outputs save as hadoop file ,even though  there is no 
> inputs , many emtpy file generate too.
> it's too expensive , 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3761) Class anonfun$1 not found exception / sbt 13.x / Scala 2.10.4

2014-10-01 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155829#comment-14155829
 ] 

Patrick Wendell commented on SPARK-3761:


Is this an sbt bug rather than a spark one? If it works in older versiosn of 
SBT but doesn't work in 13.x...

> Class anonfun$1 not found exception / sbt 13.x / Scala 2.10.4
> -
>
> Key: SPARK-3761
> URL: https://issues.apache.org/jira/browse/SPARK-3761
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Igor Tkachenko
>Priority: Blocker
>
> I have Scala code:
> val master = "spark://:7077"
> val sc = new SparkContext(new SparkConf()
>   .setMaster(master)
>   .setAppName("SparkQueryDemo 01")
>   .set("spark.executor.memory", "512m"))
> val count2 = sc .textFile("hdfs:// address>:8020/tmp/data/risk/account.txt")
>   .filter(line  => line.contains("Word"))
>   .count()
> I've got such an error:
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0:0 failed 4 times, most
> recent failure: Exception failure in TID 6 on host : 
> java.lang.ClassNotFoundExcept
> ion: SimpleApp$$anonfun$1
> My dependencies :
> object Version {
>   val spark= "1.0.0-cdh5.1.0"
> }
> object Library {
>   val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
> Version.spark
> }
> My OS is Win 7, sbt 13.5, Scala 2.10.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3761) Class anonfun$1 not found exception / sbt 13.x / Scala 2.10.4

2014-10-01 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155829#comment-14155829
 ] 

Patrick Wendell edited comment on SPARK-3761 at 10/2/14 12:03 AM:
--

Is this an sbt bug rather than a spark one? If it works in older versions of 
SBT but doesn't work in 13.x...


was (Author: pwendell):
Is this an sbt bug rather than a spark one? If it works in older versiosn of 
SBT but doesn't work in 13.x...

> Class anonfun$1 not found exception / sbt 13.x / Scala 2.10.4
> -
>
> Key: SPARK-3761
> URL: https://issues.apache.org/jira/browse/SPARK-3761
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Igor Tkachenko
>Priority: Blocker
>
> I have Scala code:
> val master = "spark://:7077"
> val sc = new SparkContext(new SparkConf()
>   .setMaster(master)
>   .setAppName("SparkQueryDemo 01")
>   .set("spark.executor.memory", "512m"))
> val count2 = sc .textFile("hdfs:// address>:8020/tmp/data/risk/account.txt")
>   .filter(line  => line.contains("Word"))
>   .count()
> I've got such an error:
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0:0 failed 4 times, most
> recent failure: Exception failure in TID 6 on host : 
> java.lang.ClassNotFoundExcept
> ion: SimpleApp$$anonfun$1
> My dependencies :
> object Version {
>   val spark= "1.0.0-cdh5.1.0"
> }
> object Library {
>   val sparkCore  = "org.apache.spark"  % "spark-assembly_2.10"  % 
> Version.spark
> }
> My OS is Win 7, sbt 13.5, Scala 2.10.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3759) SparkSubmitDriverBootstrapper should return exit code of driver process

2014-10-01 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155833#comment-14155833
 ] 

Patrick Wendell commented on SPARK-3759:


Thanks for reporting this Eric - do you want to make a pull request with the 
suggested change? We can merge it.

> SparkSubmitDriverBootstrapper should return exit code of driver process
> ---
>
> Key: SPARK-3759
> URL: https://issues.apache.org/jira/browse/SPARK-3759
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.1.0
> Environment: Linux, Windows, Scala/Java
>Reporter: Eric Eijkelenboom
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> SparkSubmitDriverBootstrapper.scala currently always returns exit code 0. 
> Instead, it should return the exit code of the driver process.
> Suggested code change in SparkSubmitDriverBootstrapper, line 157: 
> {code}
> val returnCode = process.waitFor()
> sys.exit(returnCode)
> {code}
> Workaround for this issue: 
> Instead of specifying 'driver.extra*' properties in spark-defaults.conf, pass 
> these properties to spark-submit directly. This will launch the driver 
> program without the use of SparkSubmitDriverBootstrapper. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3730) Any one else having building spark recently

2014-10-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3730.

Resolution: Invalid

For this type of question checkout the Spark dev list. To answer your specific 
question, try running a clean first. I think we are hitting some scala compiler 
bugs, but they often go away after a clean.

> Any one else having building spark recently
> ---
>
> Key: SPARK-3730
> URL: https://issues.apache.org/jira/browse/SPARK-3730
> Project: Spark
>  Issue Type: Question
>Reporter: Anant Daksh Asthana
>Priority: Minor
>
> I get an assertion error in 
> spark/core/src/main/scala/org/apache/spark/HttpServer.scala while trying to 
> build.
> I am building using 
> mvn -Pyarn -PHadoop-2.3 -DskipTests -Phive clean package
> Here is the error i get http://pastebin.com/Shi43r53



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3731) RDD caching stops working in pyspark after some time

2014-10-01 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155844#comment-14155844
 ] 

Patrick Wendell commented on SPARK-3731:


Thanks for reporting this - can you reproduce by any chance in local mode with 
a dataset you can attach to the JIRA? That makes it much easier for us to track 
down.

> RDD caching stops working in pyspark after some time
> 
>
> Key: SPARK-3731
> URL: https://issues.apache.org/jira/browse/SPARK-3731
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 1.1.0
> Environment: Linux, 32bit, both in local mode or in standalone 
> cluster mode
>Reporter: Milan Straka
> Attachments: worker.log
>
>
> Consider a file F which when loaded with sc.textFile and cached takes up 
> slightly more than half of free memory for RDD cache.
> When in PySpark the following is executed:
>   1) a = sc.textFile(F)
>   2) a.cache().count()
>   3) b = sc.textFile(F)
>   4) b.cache().count()
> and then the following is repeated (for example 10 times):
>   a) a.unpersist().cache().count()
>   b) b.unpersist().cache().count()
> after some time, there are no RDD cached in memory.
> Also, since that time, no other RDD ever gets cached (the worker always 
> reports something like "WARN CacheManager: Not enough space to cache 
> partition rdd_23_5 in memory! Free memory is 277478190 bytes.", even if 
> rdd_23_5 is ~50MB). The Executors tab of the Application Detail UI shows that 
> all executors have 0MB memory used (which is consistent with the CacheManager 
> warning).
> When doing the same in scala, everything works perfectly.
> I understand that this is a vague description, but I do no know how to 
> describe the problem better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3638) Commons HTTP client dependency conflict in extras/kinesis-asl module

2014-10-01 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3638.
---
   Resolution: Fixed
Fix Version/s: 1.1.1
   1.2.0

Issue resolved by pull request 2535
[https://github.com/apache/spark/pull/2535]

> Commons HTTP client dependency conflict in extras/kinesis-asl module
> 
>
> Key: SPARK-3638
> URL: https://issues.apache.org/jira/browse/SPARK-3638
> Project: Spark
>  Issue Type: Bug
>  Components: Examples, Streaming
>Affects Versions: 1.1.0
>Reporter: Aniket Bhatnagar
>  Labels: dependencies
> Fix For: 1.2.0, 1.1.1
>
>
> Followed instructions as mentioned @ 
> https://github.com/apache/spark/blob/master/docs/streaming-kinesis-integration.md
>  and when running the example, I get the following error:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.http.impl.conn.DefaultClientConnectionOperator.(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
> at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
> at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.(PoolingClientConnectionManager.java:114)
> at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.(PoolingClientConnectionManager.java:99)
> at 
> com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
> at 
> com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
> at 
> com.amazonaws.http.AmazonHttpClient.(AmazonHttpClient.java:181)
> at 
> com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:119)
> at 
> com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:103)
> at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.(AmazonKinesisClient.java:136)
> at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.(AmazonKinesisClient.java:117)
> at 
> com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.(AmazonKinesisAsyncClient.java:132)
> I believe this is due to the dependency conflict as described @ 
> http://mail-archives.apache.org/mod_mbox/spark-dev/201409.mbox/%3ccajob8btdxks-7-spjj5jmnw0xsnrjwdpcqqtjht1hun6j4z...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3626) Replace AsyncRDDActions with a more general async. API

2014-10-01 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3626.
---
Resolution: Won't Fix

I'm closing this as "Won't Fix" for now.  The more general API (as described in 
my PR) has some confusing semantics RE: cancellation and may be more general 
than what most users want / need.

> Replace AsyncRDDActions with a more general async. API
> --
>
> Key: SPARK-3626
> URL: https://issues.apache.org/jira/browse/SPARK-3626
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> The experimental AsyncRDDActions APIs seem to only exist in order to enable 
> job cancellation.
> We've been considering extending these APIs to support progress monitoring, 
> but this would require stabilizing them so they're no longer 
> {{@Experimental}}.
> Instead, I propose to replace all of the AsyncRDDActions with a mechanism 
> based on job groups which allows arbitrary computations to be run in job 
> groups and supports cancellation / monitoring of Spark jobs launched from 
> those computations.
> (full design pending; see my GitHub PR for more details).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3446) FutureAction should expose the job ID

2014-10-01 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3446.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

> FutureAction should expose the job ID
> -
>
> Key: SPARK-3446
> URL: https://issues.apache.org/jira/browse/SPARK-3446
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 1.2.0
>
>
> This is a follow up to SPARK-2636.
> The patch for that bug added a {{jobId}} method to {{SimpleFutureAction}}. 
> The problem is that {{SimpleFutureAction}} is not exposed through any 
> existing API; all the {{AsyncRDDActions}} methods return just 
> {{FutureAction}}. So clients have to restore to casting / isInstanceOf to be 
> able to use that.
> Exposing the {{jobId}} through {{FutureAction}} has extra complications, 
> though, because {{ComplexFutureAction}} also extends that class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-10-01 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156074#comment-14156074
 ] 

Nishkam Ravi commented on SPARK-3633:
-

For a different workload (variant of TeraSort), I see fetch failures in the 
standalone mode but not with YARN (with identical ulimit and timeout values). 
Wondering why this might be.


> Fetches failure observed after SPARK-2711
> -
>
> Key: SPARK-3633
> URL: https://issues.apache.org/jira/browse/SPARK-3633
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.1.0
>Reporter: Nishkam Ravi
>
> Running a variant of PageRank on a 6-node cluster with a 30Gb input dataset. 
> Recently upgraded to Spark 1.1. The workload fails with the following error 
> message(s):
> {code}
> 14/09/19 12:10:38 WARN TaskSetManager: Lost task 51.0 in stage 2.1 (TID 552, 
> c1705.halxg.cloudera.com): FetchFailed(BlockManagerId(1, 
> c1706.halxg.cloudera.com, 49612, 0), shuffleId=3, mapId=75, reduceId=120)
> 14/09/19 12:10:38 INFO DAGScheduler: Resubmitting failed stages
> {code}
> In order to identify the problem, I carried out change set analysis. As I go 
> back in time, the error message changes to:
> {code}
> 14/09/21 12:56:54 WARN TaskSetManager: Lost task 35.0 in stage 3.0 (TID 519, 
> c1706.halxg.cloudera.com): java.io.FileNotFoundException: 
> /var/lib/jenkins/workspace/tmp/spark-local-20140921123257-68ee/1c/temp_3a1ade13-b48a-437a-a466-673995304034
>  (Too many open files)
> java.io.FileOutputStream.open(Native Method)
> java.io.FileOutputStream.(FileOutputStream.java:221)
> 
> org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:117)
> 
> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:185)
> 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:197)
> 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:145)
> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
> 
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:51)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {code}
> All the way until Aug 4th. Turns out the problem changeset is 4fde28c. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3730) Any one else having building spark recently

2014-10-01 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156082#comment-14156082
 ] 

Anant Daksh Asthana commented on SPARK-3730:


Thanks Patrick.

On Wed, Oct 1, 2014 at 6:09 PM, Patrick Wendell (JIRA) 



> Any one else having building spark recently
> ---
>
> Key: SPARK-3730
> URL: https://issues.apache.org/jira/browse/SPARK-3730
> Project: Spark
>  Issue Type: Question
>Reporter: Anant Daksh Asthana
>Priority: Minor
>
> I get an assertion error in 
> spark/core/src/main/scala/org/apache/spark/HttpServer.scala while trying to 
> build.
> I am building using 
> mvn -Pyarn -PHadoop-2.3 -DskipTests -Phive clean package
> Here is the error i get http://pastebin.com/Shi43r53



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3105) Calling cache() after RDDs are pipelined has no effect in PySpark

2014-10-01 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156088#comment-14156088
 ] 

Josh Rosen commented on SPARK-3105:
---

I discussed this with [~tdas] and [~davies] today.

As it stands now, there's a small discrepancy between the Scala and Python 
semantics for cache() and persist().  In Scala, calling cache() or persist() 
both returns an RDD _and_ changes the persistence of the RDD instance it was 
called on, so running

{code}
val a = sc.parallelize(...).map()
val b = a.map(...)
a.count()
b.count()

a.cache()
a.count()
b.count()
{code}

will result in {{b.count()}} using a cached copy of {{a}}.

In Python, as described above, changing the persistence level of an RDD will 
not automatically cause that persistence change to be reflected in existing 
RDDs that were generated by transforming the original non-persisted RDD.

In all languages, calling cache() or persist() and performing subsequent 
transformations on the RDDs returned from cache() / persist() will work as 
expected.  

One scenario where the Python semantics might be annoying is in an IPython 
notebook.  Say that a user defines some base RDDs in cell #1, some transformed 
RDDs in cell #2, then performs actions in cell #3, calls cache() in cell #4, 
then goes back and re-runs cell #3.  In this case, the cache() won't have any 
effect.

On the one hand, I suppose we could argue that it's best to keep the semantics 
as close as possible across all languages.

Unfortunately, this would require us to make some large internal changes to 
PySpark in order to implement the Scala/Java semantics.  If we decide that this 
is worth pursuing, I can post a more detailed proposal outlining the necessary 
changes (the "tl;dr" of my proposed approach is "establish a one-to-one mapping 
between JavaRDDs and PySpark RDDs and defer the pipelining of Python functions 
to execution time").

In my interpretation, the Spark Programming Guide [seems to 
suggest|https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence]
 that persist() and cache() should modify the instance that they're called on 
rather than returning a new, different RDD which happens to be persisted:

{quote}
You can mark an RDD to be persisted using the persist() or cache() methods on 
it. 
{quote}

"Marking" an RDD sounds like it modifies the metadata of that RDD rather than 
returning a new one (maybe I'm reading too much into this, though).

Does anyone have strong opinions on whether we should change the PySpark 
semantics to match Scala's, or examples of real use-cases where PySpark's 
current cache() semantics are confusing?

> Calling cache() after RDDs are pipelined has no effect in PySpark
> -
>
> Key: SPARK-3105
> URL: https://issues.apache.org/jira/browse/SPARK-3105
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> PySpark's PipelinedRDD decides whether to pipeline transformations by 
> checking whether those transformations are pipelinable _at the time that the 
> PipelinedRDD objects are created_ rather than at the time that we invoke 
> actions.  This might lead to problems if we call {{cache()}} on an RDD after 
> it's already been used in a pipeline:
> {code}
> rdd1 = sc.parallelize(range(100)).map(lambda x: x)
> rdd2 = rdd1.map(lambda x: 2 * x)
> rdd1.cache()
> rdd2.collect()
> {code}
> When I run this code, I'd expect {cache()}} to break the pipeline and cache 
> intermediate results, but instead the two transformations are pipelined 
> together in Python, effectively ignoring the {{cache()}}.
> Note that {{cache()}} works properly if we call it before performing any 
> other transformations on the RDD:
> {code}
> rdd1 = sc.parallelize(range(100)).map(lambda x: x).cache()
> rdd2 = rdd1.map(lambda x: 2 * x)
> rdd2.collect()
> {code}
> This works as expected and caches {{rdd1}}.
> To fix this, I think we dynamically decide whether to pipeline when we 
> actually perform actions, rather than statically deciding when we create the 
> RDDs.
> We should also add tests for this.
> (Thanks to [~tdas] for pointing out this issue.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3731) RDD caching stops working in pyspark after some time

2014-10-01 Thread Milan Straka (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156125#comment-14156125
 ] 

Milan Straka commented on SPARK-3731:
-

I will get to it later today and attach a dataset and program which exhibit 
this behaviour locally. I believe I will find it because I saw this behaviour 
in many local runs.

> RDD caching stops working in pyspark after some time
> 
>
> Key: SPARK-3731
> URL: https://issues.apache.org/jira/browse/SPARK-3731
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 1.1.0
> Environment: Linux, 32bit, both in local mode or in standalone 
> cluster mode
>Reporter: Milan Straka
> Attachments: worker.log
>
>
> Consider a file F which when loaded with sc.textFile and cached takes up 
> slightly more than half of free memory for RDD cache.
> When in PySpark the following is executed:
>   1) a = sc.textFile(F)
>   2) a.cache().count()
>   3) b = sc.textFile(F)
>   4) b.cache().count()
> and then the following is repeated (for example 10 times):
>   a) a.unpersist().cache().count()
>   b) b.unpersist().cache().count()
> after some time, there are no RDD cached in memory.
> Also, since that time, no other RDD ever gets cached (the worker always 
> reports something like "WARN CacheManager: Not enough space to cache 
> partition rdd_23_5 in memory! Free memory is 277478190 bytes.", even if 
> rdd_23_5 is ~50MB). The Executors tab of the Application Detail UI shows that 
> all executors have 0MB memory used (which is consistent with the CacheManager 
> warning).
> When doing the same in scala, everything works perfectly.
> I understand that this is a vague description, but I do no know how to 
> describe the problem better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3573) Dataset

2014-10-01 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156129#comment-14156129
 ] 

Patrick Wendell commented on SPARK-3573:


I think people are hung up on the term SQL - SchemaRDD is designed to simply 
represent richer types on top of the core RDD API. In fact we though originally 
of naming the package "schema" instead of "sql" for exactly this reason. 
SchemaRDD is in the sql/core package right now, but we could pull the public 
interface of a Schema RDD into another package in the future (and maybe we'd 
drop exposing anything about the logical plan here).

I'd like to see a common representation of typed data be used across both SQL 
and MLlib and longer term other libraries as well. I don't see an 
insurmountable semantic gap between an R-style data frame and a relational 
table. In fact, if you look across other projects today - almost all of them 
are trying to unify these types of data representations.

So I'd support seeing where maybe we can enhance or extend SchemaRDD to better 
support numeric data sets. And if we find there is just too large of a gap 
here, then we could look at implementing a second dataset abstraction. If 
nothing else this is a test of whether SchemaRDD is sufficiently extensible to 
be useful in contexts beyond SQL (which is its original design).

> Dataset
> ---
>
> Key: SPARK-3573
> URL: https://issues.apache.org/jira/browse/SPARK-3573
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Critical
>
> This JIRA is for discussion of ML dataset, essentially a SchemaRDD with extra 
> ML-specific metadata embedded in its schema.
> .Sample code
> Suppose we have training events stored on HDFS and user/ad features in Hive, 
> we want to assemble features for training and then apply decision tree.
> The proposed pipeline with dataset looks like the following (need more 
> refinements):
> {code}
> sqlContext.jsonFile("/path/to/training/events", 
> 0.01).registerTempTable("event")
> val training = sqlContext.sql("""
>   SELECT event.id AS eventId, event.userId AS userId, event.adId AS adId, 
> event.action AS label,
>  user.gender AS userGender, user.country AS userCountry, 
> user.features AS userFeatures,
>  ad.targetGender AS targetGender
> FROM event JOIN user ON event.userId = user.id JOIN ad ON event.adId = 
> ad.id;""").cache()
> val indexer = new Indexer()
> val interactor = new Interactor()
> val fvAssembler = new FeatureVectorAssembler()
> val treeClassifer = new DecisionTreeClassifer()
> val paramMap = new ParamMap()
>   .put(indexer.features, Map("userCountryIndex" -> "userCountry"))
>   .put(indexer.sortByFrequency, true)
>   .put(interactor.features, Map("genderMatch" -> Array("userGender", 
> "targetGender")))
>   .put(fvAssembler.features, Map("features" -> Array("genderMatch", 
> "userCountryIndex", "userFeatures")))
>   .put(fvAssembler.dense, true)
>   .put(treeClassifer.maxDepth, 4) // By default, classifier recognizes 
> "features" and "label" columns.
> val pipeline = Pipeline.create(indexer, interactor, fvAssembler, 
> treeClassifier)
> val model = pipeline.fit(training, paramMap)
> sqlContext.jsonFile("/path/to/events", 0.01).registerTempTable("event")
> val test = sqlContext.sql("""
>   SELECT event.id AS eventId, event.userId AS userId, event.adId AS adId,
>  user.gender AS userGender, user.country AS userCountry, 
> user.features AS userFeatures,
>  ad.targetGender AS targetGender
> FROM event JOIN user ON event.userId = user.id JOIN ad ON event.adId = 
> ad.id;""")
> val prediction = model.transform(test).select('eventId, 'prediction)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3371) Spark SQL: Renaming a function expression with group by gives error

2014-10-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3371.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2511
[https://github.com/apache/spark/pull/2511]

> Spark SQL: Renaming a function expression with group by gives error
> ---
>
> Key: SPARK-3371
> URL: https://issues.apache.org/jira/browse/SPARK-3371
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Pei-Lun Lee
> Fix For: 1.2.0
>
>
> {code}
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> val rdd = sc.parallelize(List("""{"foo":"bar"}"""))
> sqlContext.jsonRDD(rdd).registerAsTable("t1")
> sqlContext.registerFunction("len", (s: String) => s.length)
> sqlContext.sql("select len(foo) as a, count(1) from t1 group by 
> len(foo)").collect()
> {code}
> running above code in spark-shell gives the following error
> {noformat}
> 14/09/03 17:20:13 ERROR Executor: Exception in task 2.0 in stage 3.0 (TID 214)
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: foo#0
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$2.apply(TreeNode.scala:201)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:199)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> {noformat}
> remove "as a" in the query causes no error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org