[jira] [Commented] (SPARK-13232) YARN executor node label expressions

2016-02-08 Thread Atkins (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137222#comment-15137222
 ] 

Atkins commented on SPARK-13232:


If spark config "spark.yarn.executor.nodeLabelExpression" present, 
*org.apache.spark.deploy.yarn.YarnAllocator#createContainerRequest* will create 
a ContainerRequest instance with locality specification of nodes, racks, and 
nodelabel which cause InvalidContainerRequestException be thrown.
This can reproduce by adding test suite in 
*org.apache.spark.deploy.yarn.YarnAllocatorSuite*
{code}
test("request executors with locality") {
val handler = createAllocator(1)
handler.updateResourceRequests()
handler.getNumExecutorsRunning should be (0)
handler.getPendingAllocate.size should be (1)

handler.requestTotalExecutorsWithPreferredLocalities(3, 20, Map(("host1", 
10), ("host2", 20)))
handler.updateResourceRequests()
handler.getPendingAllocate.size should be (3)

val container = createContainer("host1")
handler.handleAllocatedContainers(Array(container))

handler.getNumExecutorsRunning should be (1)
handler.allocatedContainerToHostMap.get(container.getId).get should be 
("host1")
handler.allocatedHostToContainersMap.get("host1").get should contain 
(container.getId)
  }
{code}

> YARN executor node label expressions
> 
>
> Key: SPARK-13232
> URL: https://issues.apache.org/jira/browse/SPARK-13232
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
> Environment: Scala 2.11.7,  Hadoop 2.7.2, Spark 1.6.0
>Reporter: Atkins
>Priority: Minor
>
> Using node label expression for executor failed to request container request 
> and throws *InvalidContainerRequestException*.
> The code
> {code:title=AMRMClientImpl.java}
>   /**
>* Valid if a node label expression specified on container request is valid 
> or
>* not
>* 
>* @param containerRequest
>*/
>   private void checkNodeLabelExpression(T containerRequest) {
> String exp = containerRequest.getNodeLabelExpression();
> 
> if (null == exp || exp.isEmpty()) {
>   return;
> }
> // Don't support specifying >= 2 node labels in a node label expression 
> now
> if (exp.contains("&&") || exp.contains("||")) {
>   throw new InvalidContainerRequestException(
>   "Cannot specify more than two node labels"
>   + " in a single node label expression");
> }
> 
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
>   }
> {code}
> doesn't allow node label with rack and node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13104) Spark Metrics currently does not return executors hostname

2016-02-08 Thread Karthik (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik updated SPARK-13104:

Description: We been using Spark Metrics and porting the data to InfluxDB 
using the Graphite sink that is available in Spark. From what I can see, it 
only provides the executorId and not the executor hostname. With each spark 
job, the executorID changes. Is there any way to find the hostname based on the 
executorID?  (was: We been using Spark Metrics and porting the data to InfluxDB 
using the Graphite sink that is available in Spark. From what I can see, it 
only provides he executorId and not the executor hostname. With each spark job, 
the executorID changes. Is there any way to find the hostname based on the 
executorID?)

> Spark Metrics currently does not return executors hostname 
> ---
>
> Key: SPARK-13104
> URL: https://issues.apache.org/jira/browse/SPARK-13104
> Project: Spark
>  Issue Type: Question
>Reporter: Karthik
>Priority: Critical
>  Labels: executor, executorId, graphite, hostname, metrics
>
> We been using Spark Metrics and porting the data to InfluxDB using the 
> Graphite sink that is available in Spark. From what I can see, it only 
> provides the executorId and not the executor hostname. With each spark job, 
> the executorID changes. Is there any way to find the hostname based on the 
> executorID?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13232) YARN executor node label expressions

2016-02-08 Thread Atkins (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137222#comment-15137222
 ] 

Atkins edited comment on SPARK-13232 at 2/8/16 4:59 PM:


I am telling about yarn doesn't allow specify node label with racks or nodes, 
so the current version of Spark is not working with config of nodeLabel on Yarn.

If spark config "spark.yarn.executor.nodeLabelExpression" present, 
*org.apache.spark.deploy.yarn.YarnAllocator#createContainerRequest* will create 
a ContainerRequest instance with locality specification of nodes, racks, and 
nodelabel which cause InvalidContainerRequestException be thrown.
This can reproduce by adding test suite in 
*org.apache.spark.deploy.yarn.YarnAllocatorSuite*
{code}
test("request executors with locality") {
val handler = createAllocator(1)
handler.updateResourceRequests()
handler.getNumExecutorsRunning should be (0)
handler.getPendingAllocate.size should be (1)

handler.requestTotalExecutorsWithPreferredLocalities(3, 20, Map(("host1", 
10), ("host2", 20)))
handler.updateResourceRequests()
handler.getPendingAllocate.size should be (3)

val container = createContainer("host1")
handler.handleAllocatedContainers(Array(container))

handler.getNumExecutorsRunning should be (1)
handler.allocatedContainerToHostMap.get(container.getId).get should be 
("host1")
handler.allocatedHostToContainersMap.get("host1").get should contain 
(container.getId)
  }
{code}


was (Author: atkins):
If spark config "spark.yarn.executor.nodeLabelExpression" present, 
*org.apache.spark.deploy.yarn.YarnAllocator#createContainerRequest* will create 
a ContainerRequest instance with locality specification of nodes, racks, and 
nodelabel which cause InvalidContainerRequestException be thrown.
This can reproduce by adding test suite in 
*org.apache.spark.deploy.yarn.YarnAllocatorSuite*
{code}
test("request executors with locality") {
val handler = createAllocator(1)
handler.updateResourceRequests()
handler.getNumExecutorsRunning should be (0)
handler.getPendingAllocate.size should be (1)

handler.requestTotalExecutorsWithPreferredLocalities(3, 20, Map(("host1", 
10), ("host2", 20)))
handler.updateResourceRequests()
handler.getPendingAllocate.size should be (3)

val container = createContainer("host1")
handler.handleAllocatedContainers(Array(container))

handler.getNumExecutorsRunning should be (1)
handler.allocatedContainerToHostMap.get(container.getId).get should be 
("host1")
handler.allocatedHostToContainersMap.get("host1").get should contain 
(container.getId)
  }
{code}

> YARN executor node label expressions
> 
>
> Key: SPARK-13232
> URL: https://issues.apache.org/jira/browse/SPARK-13232
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
> Environment: Scala 2.11.7,  Hadoop 2.7.2, Spark 1.6.0
>Reporter: Atkins
>Priority: Minor
>
> Using node label expression for executor failed to request container request 
> and throws *InvalidContainerRequestException*.
> The code
> {code:title=AMRMClientImpl.java}
>   /**
>* Valid if a node label expression specified on container request is valid 
> or
>* not
>* 
>* @param containerRequest
>*/
>   private void checkNodeLabelExpression(T containerRequest) {
> String exp = containerRequest.getNodeLabelExpression();
> 
> if (null == exp || exp.isEmpty()) {
>   return;
> }
> // Don't support specifying >= 2 node labels in a node label expression 
> now
> if (exp.contains("&&") || exp.contains("||")) {
>   throw new InvalidContainerRequestException(
>   "Cannot specify more than two node labels"
>   + " in a single node label expression");
> }
> 
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
>   }
> {code}
> doesn't allow node label with rack and node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13233) Python Dataset

2016-02-08 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-13233:
---

 Summary: Python Dataset
 Key: SPARK-13233
 URL: https://issues.apache.org/jira/browse/SPARK-13233
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13233) Python Dataset

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137180#comment-15137180
 ] 

Apache Spark commented on SPARK-13233:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/7

> Python Dataset
> --
>
> Key: SPARK-13233
> URL: https://issues.apache.org/jira/browse/SPARK-13233
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Wenchen Fan
> Attachments: DesignDocPythonDataset.pdf
>
>
> add Python Dataset w.r.t. the scala version



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13233) Python Dataset

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13233:


Assignee: (was: Apache Spark)

> Python Dataset
> --
>
> Key: SPARK-13233
> URL: https://issues.apache.org/jira/browse/SPARK-13233
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Wenchen Fan
> Attachments: DesignDocPythonDataset.pdf
>
>
> add Python Dataset w.r.t. the scala version



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated

2016-02-08 Thread sachin aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137136#comment-15137136
 ] 

sachin aggarwal commented on SPARK-13172:
-

instead of getStackTraceString should I  use e.getStackTrace or 
e.printStackTrace

> Stop using RichException.getStackTrace it is deprecated
> ---
>
> Key: SPARK-13172
> URL: https://issues.apache.org/jira/browse/SPARK-13172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> Throwable getStackTrace is the recommended alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10528) spark-shell throws java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable.

2016-02-08 Thread Sangeet Chourey (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137168#comment-15137168
 ] 

Sangeet Chourey edited comment on SPARK-10528 at 2/8/16 4:26 PM:
-

RESOLVED  : Downloaded the correct Winutils version and issue was resolved. 
Ideally, it should be locally compiled but if downloading compiled version make 
sure that it is 32/64 bit as applicable. 

I tried on Windows 7 64 bit, Spark 1.6 and downloaded winutils.exe from 
https://www.barik.net/archive/2015/01/19/172716/  and it worked..!!

Complete Steps are at : 
http://letstalkspark.blogspot.com/2016/02/getting-started-with-spark-on-window-64.html


was (Author: sybergeek):
RESOLVED  : Downloaded the correct Winutils version and issue was resolved. 
Ideally, it should be locally compiled but if downloading compiled version make 
sure that it is 32/64 bit as applicable. 

I tried on Windows 7 64 bit, Spark 1.6 and downloaded winutils.exe from 
https://www.barik.net/archive/2015/01/19/172716/  and it worked..!!


> spark-shell throws java.lang.RuntimeException: The root scratch dir: 
> /tmp/hive on HDFS should be writable.
> --
>
> Key: SPARK-10528
> URL: https://issues.apache.org/jira/browse/SPARK-10528
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.5.0
> Environment: Windows 7 x64
>Reporter: Aliaksei Belablotski
>Priority: Minor
>
> Starting spark-shell throws
> java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: 
> /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13233) Python Dataset

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13233:


Assignee: Apache Spark

> Python Dataset
> --
>
> Key: SPARK-13233
> URL: https://issues.apache.org/jira/browse/SPARK-13233
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
> Attachments: DesignDocPythonDataset.pdf
>
>
> add Python Dataset w.r.t. the scala version



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10066) Can't create HiveContext with spark-shell or spark-sql on snapshot

2016-02-08 Thread Sangeet Chourey (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137201#comment-15137201
 ] 

Sangeet Chourey commented on SPARK-10066:
-

RESOLVED : Downloaded the correct Winutils version and issue was resolved. 
Ideally, it should be locally compiled but if downloading compiled version make 
sure that it is 32/64 bit as applicable.
I tried on Windows 7 64 bit, Spark 1.6 and downloaded winutils.exe from 
https://www.barik.net/archive/2015/01/19/172716/ and it worked..!!
Complete Steps are at : 
http://letstalkspark.blogspot.com/2016/02/getting-started-with-spark-on-window-64.html

> Can't create HiveContext with spark-shell or spark-sql on snapshot
> --
>
> Key: SPARK-10066
> URL: https://issues.apache.org/jira/browse/SPARK-10066
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.5.0
> Environment: Centos 6.6
>Reporter: Robert Beauchemin
>Priority: Minor
>
> Built the 1.5.0-preview-20150812 with the following:
> ./make-distribution.sh -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive 
> -Phive-thriftserver -Psparkr -DskipTests
> Starting spark-shell or spark-sql returns the following error: 
> java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: 
> /tmp/hive on HDFS should be writable. Current permissions are: rwx--
> at 
> org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
>  [elided]
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)   
> 
> It's trying to create a new HiveContext. Running pySpark or sparkR works and 
> creates a HiveContext successfully. SqlContext can be created successfully 
> with any shell.
> I've tried changing permissions on that HDFS directory (even as far as making 
> it world-writable) without success. Tried changing SPARK_USER and also 
> running spark-shell as different users without success.
> This works on same machine on 1.4.1 and on earlier pre-release versions of 
> Spark 1.5.0 (same make-distribution parms) sucessfully. Just trying the 
> snapshot... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13233) Python Dataset

2016-02-08 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-13233:

Attachment: DesignDocPythonDataset.pdf

> Python Dataset
> --
>
> Key: SPARK-13233
> URL: https://issues.apache.org/jira/browse/SPARK-13233
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Wenchen Fan
> Attachments: DesignDocPythonDataset.pdf
>
>
> add Python Dataset w.r.t. the scala version



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13233) Python Dataset

2016-02-08 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-13233:

Description: add Python Dataset w.r.t. the scala version

> Python Dataset
> --
>
> Key: SPARK-13233
> URL: https://issues.apache.org/jira/browse/SPARK-13233
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Wenchen Fan
> Attachments: DesignDocPythonDataset.pdf
>
>
> add Python Dataset w.r.t. the scala version



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10528) spark-shell throws java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable.

2016-02-08 Thread Sangeet Chourey (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137168#comment-15137168
 ] 

Sangeet Chourey commented on SPARK-10528:
-

RESOLVED  : Downloaded the correct Winutils version and issue was resolved. 
Ideally, it should be locally compiled but if downloading compiled version make 
sure that it is 32/64 bit as applicable. 

I tried on Windows 7 64 bit, Spark 1.6 and downloaded winutils.exe from 
https://www.barik.net/archive/2015/01/19/172716/  and it worked..!!


> spark-shell throws java.lang.RuntimeException: The root scratch dir: 
> /tmp/hive on HDFS should be writable.
> --
>
> Key: SPARK-10528
> URL: https://issues.apache.org/jira/browse/SPARK-10528
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.5.0
> Environment: Windows 7 x64
>Reporter: Aliaksei Belablotski
>Priority: Minor
>
> Starting spark-shell throws
> java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: 
> /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11714) Make Spark on Mesos honor port restrictions

2016-02-08 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136747#comment-15136747
 ] 

Stavros Kontopoulos commented on SPARK-11714:
-

[~andrewor14] Would it be meaningful to move the code to coarse grained, since 
now fine grained is deprecated?

> Make Spark on Mesos honor port restrictions
> ---
>
> Key: SPARK-11714
> URL: https://issues.apache.org/jira/browse/SPARK-11714
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Charles Allen
>
> Currently the MesosSchedulerBackend does not make any effort to honor "ports" 
> as a resource offer in Mesos. This ask is to have the ports which the 
> executor binds to honor the limits of the "ports" resource of an offer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-08 Thread Herman Schistad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136757#comment-15136757
 ] 

Herman Schistad edited comment on SPARK-13198 at 2/8/16 9:55 AM:
-

Hi [~srowen], thanks for your reply. I have indeed tried to look at the program 
using a profiler and I've attached two screenshots ([one|^Screen Shot 
2016-02-08 at 09.31.10.png] and [two|^Screen Shot 2016-02-08 at 09.30.59.png]) 
from jvisualvm connected to the driver JMX interface. You can see that the "Old 
Gen" space is completely full. You see that dip at 09:30:00? That's me 
triggering a manual GC.

It might be unusual to do this, but in any case (given the existence of 
sc.stop()) it should work right? My use case is having X number of different 
parquet directories which need to be loaded and analysed linearly, as part of a 
generic platform where users are able to upload data and apply daily/hourly 
aggregations on them. I've also seen people starting and stopping contexts 
quite frequently when doing unit tests etc.

Using G1 garbage collection doesn't seem to affect the end result either.

I'm also attaching a [GC log|^gc.log] in it's raw format. You can see it's 
trying to do a full GC at multiple times during the execution of the program.

Thanks again Sean.


was (Author: hermansc):
Hi [~srowen], thanks for your reply. I have indeed tried to look at the program 
using a profiler and I've attached two screenshots from jvisualvm connected to 
the driver JMX interface. You can see that the "Old Gen" space is completely 
full. You see that dip at 09:30:00? That's me triggering a manual GC.

It might be unusual to do this, but in any case (given the existence of 
sc.stop()) it should work right? My use case is having X number of different 
parquet directories which need to be loaded and analysed linearly, as part of a 
generic platform where users are able to upload data and apply daily/hourly 
aggregations on them. I've also seen people starting and stopping contexts 
quite frequently when doing unit tests etc.

Using G1 garbage collection doesn't seem to affect the end result either.

I'm also attaching a GC log in it's raw format. You can see it's trying to do a 
full GC at multiple times during the execution of the program.

Thanks again Sean.

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png, Screen 
> Shot 2016-02-08 at 09.30.59.png, Screen Shot 2016-02-08 at 09.31.10.png, 
> gc.log
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Assigned] (SPARK-13177) Update ActorWordCount example to not directly use low level linked list as it is deprecated.

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13177:


Assignee: (was: Apache Spark)

> Update ActorWordCount example to not directly use low level linked list as it 
> is deprecated.
> 
>
> Key: SPARK-13177
> URL: https://issues.apache.org/jira/browse/SPARK-13177
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: holdenk
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7848) Update SparkStreaming docs to incorporate FAQ and/or bullets w/ "knobs" information.

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7848:
---

Assignee: Apache Spark

> Update SparkStreaming docs to incorporate FAQ and/or bullets w/ "knobs" 
> information.
> 
>
> Key: SPARK-7848
> URL: https://issues.apache.org/jira/browse/SPARK-7848
> Project: Spark
>  Issue Type: Documentation
>  Components: Streaming
>Reporter: jay vyas
>Assignee: Apache Spark
>
> A recent email on the maligning list detailed a bunch of great "knobs" to 
> remember for spark streaming. 
> Lets integrate this  into the docs where appropriate.
> I'll paste the raw text in a comment field below



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7848) Update SparkStreaming docs to incorporate FAQ and/or bullets w/ "knobs" information.

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7848:
---

Assignee: (was: Apache Spark)

> Update SparkStreaming docs to incorporate FAQ and/or bullets w/ "knobs" 
> information.
> 
>
> Key: SPARK-7848
> URL: https://issues.apache.org/jira/browse/SPARK-7848
> Project: Spark
>  Issue Type: Documentation
>  Components: Streaming
>Reporter: jay vyas
>
> A recent email on the maligning list detailed a bunch of great "knobs" to 
> remember for spark streaming. 
> Lets integrate this  into the docs where appropriate.
> I'll paste the raw text in a comment field below



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7848) Update SparkStreaming docs to incorporate FAQ and/or bullets w/ "knobs" information.

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136849#comment-15136849
 ] 

Apache Spark commented on SPARK-7848:
-

User 'nirmannarang' has created a pull request for this issue:
https://github.com/apache/spark/pull/4

> Update SparkStreaming docs to incorporate FAQ and/or bullets w/ "knobs" 
> information.
> 
>
> Key: SPARK-7848
> URL: https://issues.apache.org/jira/browse/SPARK-7848
> Project: Spark
>  Issue Type: Documentation
>  Components: Streaming
>Reporter: jay vyas
>
> A recent email on the maligning list detailed a bunch of great "knobs" to 
> remember for spark streaming. 
> Lets integrate this  into the docs where appropriate.
> I'll paste the raw text in a comment field below



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-08 Thread Herman Schistad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman Schistad updated SPARK-13198:

Attachment: Screen Shot 2016-02-08 at 09.30.59.png
Screen Shot 2016-02-08 at 09.31.10.png
gc.log

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png, Screen 
> Shot 2016-02-08 at 09.30.59.png, Screen Shot 2016-02-08 at 09.31.10.png, 
> gc.log
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-13156) JDBC using multiple partitions creates additional tasks but only executes on one

2016-02-08 Thread Charles Drotar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Drotar updated SPARK-13156:
---
Comment: was deleted

(was: Thanks Sean. The driver inhibiting the concurrent connections was the 
issue. Apparently the Teradata driver does not support concurrent connections 
and instead suggests creating different sessions for each query. I don't think 
this is truly an issue so I will close out the JIRA.)

> JDBC using multiple partitions creates additional tasks but only executes on 
> one
> 
>
> Key: SPARK-13156
> URL: https://issues.apache.org/jira/browse/SPARK-13156
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.5.0
> Environment: Hadoop 2.6.0-cdh5.4.0, Teradata, yarn-client
>Reporter: Charles Drotar
>
> I can successfully kick off a query through JDBC to Teradata, and when it 
> runs it creates a task on each executor for every partition. The problem is 
> that all of the tasks except for one complete within a couple seconds and the 
> final task handles the entire dataset.
> Example Code:
> private val properties = new java.util.Properties()
> properties.setProperty("driver","com.teradata.jdbc.TeraDriver")
> properties.setProperty("username","foo")
> properties.setProperty("password","bar")
> val url = "jdbc:teradata://oneview/, TMODE=TERA,TYPE=FASTEXPORT,SESSIONS=10"
> val numPartitions = 5
> val dbTableTemp = "( SELECT  id MOD $numPartitions%d AS modulo, id FROM 
> db.table) AS TEMP_TABLE"
> val partitionColumn = "modulo"
> val lowerBound = 0.toLong
> val upperBound = (numPartitions-1).toLong
> val df = 
> sqlContext.read.jdbc(url,dbTableTemp,partitionColumn,lowerBound,upperBound,numPartitions,properties)
> df.write.parquet("/output/path/for/df/")
> When I look at the Spark UI I see the 5 tasks, but only 1 is actually 
> querying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-08 Thread Herman Schistad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136774#comment-15136774
 ] 

Herman Schistad commented on SPARK-13198:
-

Digging more into the dumped heap and running a memory leak report (using 
Eclipse Memory Analyzer) I'm seeing the following result:

!Screen Shot 2016-02-08 at 10.03.04.png|width=400!

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png, Screen 
> Shot 2016-02-08 at 09.30.59.png, Screen Shot 2016-02-08 at 09.31.10.png, 
> Screen Shot 2016-02-08 at 10.03.04.png, gc.log
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13231) Rename Accumulable.countFailedValues to Accumulable.includeValuesOfFailedTasks and make it a user facing API.

2016-02-08 Thread Prashant Sharma (JIRA)
Prashant Sharma created SPARK-13231:
---

 Summary: Rename Accumulable.countFailedValues to 
Accumulable.includeValuesOfFailedTasks and make it a user facing API.
 Key: SPARK-13231
 URL: https://issues.apache.org/jira/browse/SPARK-13231
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.6.0
Reporter: Prashant Sharma
Priority: Minor


Rename Accumulable.countFailedValues to Accumulable.includeValuesOfFailedTasks 
(or includeFailedTasks) I liked the longer version though. 

Exposing it to user has no disadvantage I can think of, but it can be useful 
for them. One scenario can be a user defined metric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13156) JDBC using multiple partitions creates additional tasks but only executes on one

2016-02-08 Thread Charles Drotar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136834#comment-15136834
 ] 

Charles Drotar commented on SPARK-13156:


Thanks Sean. The driver inhibiting the concurrent connections was the issue. 
Apparently the Teradata driver does not support concurrent connections and 
instead suggests creating different sessions for each query. I don't think this 
is truly an issue so I will close out the JIRA.

> JDBC using multiple partitions creates additional tasks but only executes on 
> one
> 
>
> Key: SPARK-13156
> URL: https://issues.apache.org/jira/browse/SPARK-13156
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.5.0
> Environment: Hadoop 2.6.0-cdh5.4.0, Teradata, yarn-client
>Reporter: Charles Drotar
>
> I can successfully kick off a query through JDBC to Teradata, and when it 
> runs it creates a task on each executor for every partition. The problem is 
> that all of the tasks except for one complete within a couple seconds and the 
> final task handles the entire dataset.
> Example Code:
> private val properties = new java.util.Properties()
> properties.setProperty("driver","com.teradata.jdbc.TeraDriver")
> properties.setProperty("username","foo")
> properties.setProperty("password","bar")
> val url = "jdbc:teradata://oneview/, TMODE=TERA,TYPE=FASTEXPORT,SESSIONS=10"
> val numPartitions = 5
> val dbTableTemp = "( SELECT  id MOD $numPartitions%d AS modulo, id FROM 
> db.table) AS TEMP_TABLE"
> val partitionColumn = "modulo"
> val lowerBound = 0.toLong
> val upperBound = (numPartitions-1).toLong
> val df = 
> sqlContext.read.jdbc(url,dbTableTemp,partitionColumn,lowerBound,upperBound,numPartitions,properties)
> df.write.parquet("/output/path/for/df/")
> When I look at the Spark UI I see the 5 tasks, but only 1 is actually 
> querying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-13156) JDBC using multiple partitions creates additional tasks but only executes on one

2016-02-08 Thread Charles Drotar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Drotar closed SPARK-13156.
--
Resolution: Not A Problem

The driver class was inhibiting concurrent connections. This was unrelated to 
Spark's jdbc functionality.

> JDBC using multiple partitions creates additional tasks but only executes on 
> one
> 
>
> Key: SPARK-13156
> URL: https://issues.apache.org/jira/browse/SPARK-13156
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.5.0
> Environment: Hadoop 2.6.0-cdh5.4.0, Teradata, yarn-client
>Reporter: Charles Drotar
>
> I can successfully kick off a query through JDBC to Teradata, and when it 
> runs it creates a task on each executor for every partition. The problem is 
> that all of the tasks except for one complete within a couple seconds and the 
> final task handles the entire dataset.
> Example Code:
> private val properties = new java.util.Properties()
> properties.setProperty("driver","com.teradata.jdbc.TeraDriver")
> properties.setProperty("username","foo")
> properties.setProperty("password","bar")
> val url = "jdbc:teradata://oneview/, TMODE=TERA,TYPE=FASTEXPORT,SESSIONS=10"
> val numPartitions = 5
> val dbTableTemp = "( SELECT  id MOD $numPartitions%d AS modulo, id FROM 
> db.table) AS TEMP_TABLE"
> val partitionColumn = "modulo"
> val lowerBound = 0.toLong
> val upperBound = (numPartitions-1).toLong
> val df = 
> sqlContext.read.jdbc(url,dbTableTemp,partitionColumn,lowerBound,upperBound,numPartitions,properties)
> df.write.parquet("/output/path/for/df/")
> When I look at the Spark UI I see the 5 tasks, but only 1 is actually 
> querying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-08 Thread Herman Schistad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136757#comment-15136757
 ] 

Herman Schistad commented on SPARK-13198:
-

Hi [~srowen], thanks for your reply. I have indeed tried to look at the program 
using a profiler and I've attached two screenshots from jvisualvm connected to 
the driver JMX interface. You can see that the "Old Gen" space is completely 
full. You see that dip at 09:30:00? That's me triggering a manual GC.

It might be unusual to do this, but in any case (given the existence of 
sc.stop()) it should work right? My use case is having X number of different 
parquet directories which need to be loaded and analysed linearly, as part of a 
generic platform where users are able to upload data and apply daily/hourly 
aggregations on them. I've also seen people starting and stopping contexts 
quite frequently when doing unit tests etc.

Using G1 garbage collection doesn't seem to affect the end result either.

I'm also attaching a GC log in it's raw format. You can see it's trying to do a 
full GC at multiple times during the execution of the program.

Thanks again Sean.

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136773#comment-15136773
 ] 

Sean Owen commented on SPARK-13198:
---

I don't think stop() is relevant here. There's not an active attempt to free up 
resources once the app is done. It's assumed the driver JVM is shutting down.

Yes, the question was whether it had tried to do a full GC, and sounds like it 
has done, OK.

Still if you're just finding there is a bunch of left over bookkeeping info for 
executors, probably from all the old contexts, I think that's "normal" or at 
least "not a problem as Spark is intended to be used"

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png, Screen 
> Shot 2016-02-08 at 09.30.59.png, Screen Shot 2016-02-08 at 09.31.10.png, 
> Screen Shot 2016-02-08 at 10.03.04.png, gc.log
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-08 Thread Herman Schistad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman Schistad updated SPARK-13198:

Attachment: Screen Shot 2016-02-08 at 10.03.04.png

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png, Screen 
> Shot 2016-02-08 at 09.30.59.png, Screen Shot 2016-02-08 at 09.31.10.png, 
> Screen Shot 2016-02-08 at 10.03.04.png, gc.log
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13177) Update ActorWordCount example to not directly use low level linked list as it is deprecated.

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136791#comment-15136791
 ] 

Apache Spark commented on SPARK-13177:
--

User 'agsachin' has created a pull request for this issue:
https://github.com/apache/spark/pull/3

> Update ActorWordCount example to not directly use low level linked list as it 
> is deprecated.
> 
>
> Key: SPARK-13177
> URL: https://issues.apache.org/jira/browse/SPARK-13177
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: holdenk
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13177) Update ActorWordCount example to not directly use low level linked list as it is deprecated.

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13177:


Assignee: Apache Spark

> Update ActorWordCount example to not directly use low level linked list as it 
> is deprecated.
> 
>
> Key: SPARK-13177
> URL: https://issues.apache.org/jira/browse/SPARK-13177
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0

2016-02-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138251#comment-15138251
 ] 

Devaraj K commented on SPARK-13117:
---

Thanks [~jjordan].

> WebUI should use the local ip not 0.0.0.0
> -
>
> Key: SPARK-13117
> URL: https://issues.apache.org/jira/browse/SPARK-13117
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: Jeremiah Jordan
>
> When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP 
> except the WebUI.  The WebUI should use the SPARK_LOCAL_IP not always use 
> 0.0.0.0
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138289#comment-15138289
 ] 

Xiao Li edited comment on SPARK-13219 at 2/9/16 3:50 AM:
-

Could you wait for me to fix another issue? When I tried your query, I found a 
bug of attribute resolution in the latest build. I need to fix it now. Thanks!


was (Author: smilegator):
Could you wait for me to fix another issue? When I tried your query, I found a 
bug in the latest build. I need to fix it now. Thanks!

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138297#comment-15138297
 ] 

Xiao Li commented on SPARK-13219:
-

It should works without Join clauses, as long as these predicates are pushed 
into the join conditions. If they are only in filters, this is not applicable 
and we should not pushed down the inferred conditions. 

However, the current PR is unable to infer the conditions when needing multiple 
hops. Actually, I am glad to work on it, if needed. 

Let me at [~rxin] [~marmbrus]. 

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13238) Add ganglia dmax parameter

2016-02-08 Thread Ekasit Kijsipongse (JIRA)
Ekasit Kijsipongse created SPARK-13238:
--

 Summary: Add ganglia dmax parameter
 Key: SPARK-13238
 URL: https://issues.apache.org/jira/browse/SPARK-13238
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.6.0
Reporter: Ekasit Kijsipongse
Priority: Minor


The current ganglia reporter doesn't set metric expiration time (dmax). The 
metrics of all finished applications are indefinitely left displayed in ganglia 
web. The dmax parameter allows user to set the lifetime of the metrics. The 
default value is 0 for compatibility with previous versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-13219:

Comment: was deleted

(was: Could you wait for me to fix another issue? When I tried your query, I 
found a bug of attribute resolution in the latest build. I need to fix it now. 
Thanks!)

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138404#comment-15138404
 ] 

Xiao Li commented on SPARK-13219:
-

Sorry, there is a bug in the original PR. You just need to change the code 
based on the fix:

https://github.com/gatorsmile/spark/commit/20d46c9bee2d99966406e6450b159ca404578aa6

Let me know if it works now. Thanks!

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12992) Vectorize parquet decoding using ColumnarBatch

2016-02-08 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12992.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11055
[https://github.com/apache/spark/pull/11055]

> Vectorize parquet decoding using ColumnarBatch
> --
>
> Key: SPARK-12992
> URL: https://issues.apache.org/jira/browse/SPARK-12992
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Nong Li
>Assignee: Apache Spark
> Fix For: 2.0.0
>
>
> Parquet files benefit from vectorized decoding. ColumnarBatches have been 
> designed to support this. This means that a single encoded parquet column is 
> decoded to a single ColumnVector. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated

2016-02-08 Thread sachin aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138423#comment-15138423
 ] 

sachin aggarwal edited comment on SPARK-13172 at 2/9/16 6:49 AM:
-

There are two ways we can proceed :- first use printStackTrace and second to 
use mkString()

1) can be ecapsulated in function  :-
{code}
def getStackTraceAsString(t: Throwable) = {
val sw = new StringWriter
t.printStackTrace(new PrintWriter(sw))
sw.toString
}
{code}

2) println(t.getStackTrace.mkString("\n"))

mkstring approach give extractly same string as old function getStackTraceString
but the output of first approach is more readable.
h3.Example

{code:title=TrySuccessFailure.scala|borderStyle=solid}
import scala.util.{Try, Success, Failure}
import java.io._

object TrySuccessFailure extends App {

  badAdder(3) match {
case Success(i) => println(s"success, i = $i")
case Failure(t) =>
  // this works, but it's not too useful/readable
  println(t.getStackTrace.mkString("\n"))
  println("===")
  println(t.getStackTraceString)
  // this works much better
  val sw = new StringWriter
  t.printStackTrace(new PrintWriter(sw))
  println(sw.toString)
  }
  def badAdder(a: Int): Try[Int] = {
Try({
  val b = a + 1
  if (b == 3) b else {
val ioe = new IOException("Boom!")
throw new AlsException("Bummer!", ioe)
  }
})
  }
  class AlsException(s: String, e: Exception) extends Exception(s: String, e: 
Exception)
}
{code}



was (Author: sachin aggarwal):
there are two ways we can proceed :- first use printStackTrace and second to 
use mkString()

1) can be ecapsulated in function  :-
{code}
def getStackTraceAsString(t: Throwable) = {
val sw = new StringWriter
t.printStackTrace(new PrintWriter(sw))
sw.toString
}
{code}

2) println(t.getStackTrace.mkString("\n"))

mkstring approach give extractly same string as old function getStackTraceString
but the output of first approach is more readable.
h3.Example

{code:title=TrySuccessFailure.scala|borderStyle=solid}
import scala.util.{Try, Success, Failure}
import java.io._

object TrySuccessFailure extends App {

  badAdder(3) match {
case Success(i) => println(s"success, i = $i")
case Failure(t) =>
  // this works, but it's not too useful/readable
  println(t.getStackTrace.mkString("\n"))
  println("===")
  println(t.getStackTraceString)
  // this works much better
  val sw = new StringWriter
  t.printStackTrace(new PrintWriter(sw))
  println(sw.toString)
  }
  def badAdder(a: Int): Try[Int] = {
Try({
  val b = a + 1
  if (b == 3) b else {
val ioe = new IOException("Boom!")
throw new AlsException("Bummer!", ioe)
  }
})
  }
  class AlsException(s: String, e: Exception) extends Exception(s: String, e: 
Exception)
}
{code}


> Stop using RichException.getStackTrace it is deprecated
> ---
>
> Key: SPARK-13172
> URL: https://issues.apache.org/jira/browse/SPARK-13172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> Throwable getStackTrace is the recommended alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13017) Replace example code in mllib-feature-extraction.md using include_example

2016-02-08 Thread Xin Ren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138310#comment-15138310
 ] 

Xin Ren commented on SPARK-13017:
-

I'm working on this one, thanks :)

> Replace example code in mllib-feature-extraction.md using include_example
> -
>
> Key: SPARK-13017
> URL: https://issues.apache.org/jira/browse/SPARK-13017
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> See examples in other finished sub-JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated

2016-02-08 Thread sachin aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138423#comment-15138423
 ] 

sachin aggarwal edited comment on SPARK-13172 at 2/9/16 6:33 AM:
-

there are two ways we can proceed :- first use printStackTrace and second to 
use mkString()

1) can be ecapsulated in function  :-
def getStackTraceAsString(t: Throwable) = {
val sw = new StringWriter
t.printStackTrace(new PrintWriter(sw))
sw.toString
}

2) println(t.getStackTrace.mkString("\n"))


mkstring approach give extractly same string as old function getStackTraceString

but the output of first approach is more readable.

try this code to see the difference 

{code:title=TrySuccessFailure.scala|borderStyle=solid}
import scala.util.{Try, Success, Failure}
import java.io._

object TrySuccessFailure extends App {

  badAdder(3) match {
case Success(i) => println(s"success, i = $i")
case Failure(t) =>
  // this works, but it's not too useful/readable
  println(t.getStackTrace.mkString("\n"))
  println("===")
  println(t.getStackTraceString)
  // this works much better
  val sw = new StringWriter
  t.printStackTrace(new PrintWriter(sw))
  println(sw.toString)
  }
  def badAdder(a: Int): Try[Int] = {
Try({
  val b = a + 1
  if (b == 3) b else {
val ioe = new IOException("Boom!")
throw new AlsException("Bummer!", ioe)
  }
})
  }
  class AlsException(s: String, e: Exception) extends Exception(s: String, e: 
Exception)
}
{code}



was (Author: sachin aggarwal):
there are two ways we can proceed :- first use printStackTrace and second to 
use mkString()

1) can be ecapsulated in function  :-
def getStackTraceAsString(t: Throwable) = {
val sw = new StringWriter
t.printStackTrace(new PrintWriter(sw))
sw.toString
}

2) println(t.getStackTrace.mkString("\n"))


mkstring approach give extractly same string as old function getStackTraceString

but the output of first approach is more readable.

try this code to see the difference 

```
import scala.util.{Try, Success, Failure}
import java.io._

object TrySuccessFailure extends App {

  badAdder(3) match {
case Success(i) => println(s"success, i = $i")
case Failure(t) =>
  // this works, but it's not too useful/readable
  println(t.getStackTrace.mkString("\n"))
  println("===")
  println(t.getStackTraceString)
  // this works much better
  val sw = new StringWriter
  t.printStackTrace(new PrintWriter(sw))
  println(sw.toString)
  }
  def badAdder(a: Int): Try[Int] = {
Try({
  val b = a + 1
  if (b == 3) b else {
val ioe = new IOException("Boom!")
throw new AlsException("Bummer!", ioe)
  }
})
  }
  class AlsException(s: String, e: Exception) extends Exception(s: String, e: 
Exception)
}
```


> Stop using RichException.getStackTrace it is deprecated
> ---
>
> Key: SPARK-13172
> URL: https://issues.apache.org/jira/browse/SPARK-13172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> Throwable getStackTrace is the recommended alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated

2016-02-08 Thread sachin aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138423#comment-15138423
 ] 

sachin aggarwal edited comment on SPARK-13172 at 2/9/16 6:33 AM:
-

there are two ways we can proceed :- first use printStackTrace and second to 
use mkString()

1) can be ecapsulated in function  :-
{code}
def getStackTraceAsString(t: Throwable) = {
val sw = new StringWriter
t.printStackTrace(new PrintWriter(sw))
sw.toString
}
{code}

2) println(t.getStackTrace.mkString("\n"))


mkstring approach give extractly same string as old function getStackTraceString

but the output of first approach is more readable.

try this code to see the difference 

{code:title=TrySuccessFailure.scala|borderStyle=solid}
import scala.util.{Try, Success, Failure}
import java.io._

object TrySuccessFailure extends App {

  badAdder(3) match {
case Success(i) => println(s"success, i = $i")
case Failure(t) =>
  // this works, but it's not too useful/readable
  println(t.getStackTrace.mkString("\n"))
  println("===")
  println(t.getStackTraceString)
  // this works much better
  val sw = new StringWriter
  t.printStackTrace(new PrintWriter(sw))
  println(sw.toString)
  }
  def badAdder(a: Int): Try[Int] = {
Try({
  val b = a + 1
  if (b == 3) b else {
val ioe = new IOException("Boom!")
throw new AlsException("Bummer!", ioe)
  }
})
  }
  class AlsException(s: String, e: Exception) extends Exception(s: String, e: 
Exception)
}
{code}



was (Author: sachin aggarwal):
there are two ways we can proceed :- first use printStackTrace and second to 
use mkString()

1) can be ecapsulated in function  :-
def getStackTraceAsString(t: Throwable) = {
val sw = new StringWriter
t.printStackTrace(new PrintWriter(sw))
sw.toString
}

2) println(t.getStackTrace.mkString("\n"))


mkstring approach give extractly same string as old function getStackTraceString

but the output of first approach is more readable.

try this code to see the difference 

{code:title=TrySuccessFailure.scala|borderStyle=solid}
import scala.util.{Try, Success, Failure}
import java.io._

object TrySuccessFailure extends App {

  badAdder(3) match {
case Success(i) => println(s"success, i = $i")
case Failure(t) =>
  // this works, but it's not too useful/readable
  println(t.getStackTrace.mkString("\n"))
  println("===")
  println(t.getStackTraceString)
  // this works much better
  val sw = new StringWriter
  t.printStackTrace(new PrintWriter(sw))
  println(sw.toString)
  }
  def badAdder(a: Int): Try[Int] = {
Try({
  val b = a + 1
  if (b == 3) b else {
val ioe = new IOException("Boom!")
throw new AlsException("Bummer!", ioe)
  }
})
  }
  class AlsException(s: String, e: Exception) extends Exception(s: String, e: 
Exception)
}
{code}


> Stop using RichException.getStackTrace it is deprecated
> ---
>
> Key: SPARK-13172
> URL: https://issues.apache.org/jira/browse/SPARK-13172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> Throwable getStackTrace is the recommended alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated

2016-02-08 Thread sachin aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138423#comment-15138423
 ] 

sachin aggarwal edited comment on SPARK-13172 at 2/9/16 6:35 AM:
-

there are two ways we can proceed :- first use printStackTrace and second to 
use mkString()

1) can be ecapsulated in function  :-
{code}
def getStackTraceAsString(t: Throwable) = {
val sw = new StringWriter
t.printStackTrace(new PrintWriter(sw))
sw.toString
}
{code}

2) println(t.getStackTrace.mkString("\n"))

mkstring approach give extractly same string as old function getStackTraceString
but the output of first approach is more readable.
h3.Example

{code:title=TrySuccessFailure.scala|borderStyle=solid}
import scala.util.{Try, Success, Failure}
import java.io._

object TrySuccessFailure extends App {

  badAdder(3) match {
case Success(i) => println(s"success, i = $i")
case Failure(t) =>
  // this works, but it's not too useful/readable
  println(t.getStackTrace.mkString("\n"))
  println("===")
  println(t.getStackTraceString)
  // this works much better
  val sw = new StringWriter
  t.printStackTrace(new PrintWriter(sw))
  println(sw.toString)
  }
  def badAdder(a: Int): Try[Int] = {
Try({
  val b = a + 1
  if (b == 3) b else {
val ioe = new IOException("Boom!")
throw new AlsException("Bummer!", ioe)
  }
})
  }
  class AlsException(s: String, e: Exception) extends Exception(s: String, e: 
Exception)
}
{code}



was (Author: sachin aggarwal):
there are two ways we can proceed :- first use printStackTrace and second to 
use mkString()

1) can be ecapsulated in function  :-
{code}
def getStackTraceAsString(t: Throwable) = {
val sw = new StringWriter
t.printStackTrace(new PrintWriter(sw))
sw.toString
}
{code}

2) println(t.getStackTrace.mkString("\n"))


mkstring approach give extractly same string as old function getStackTraceString

but the output of first approach is more readable.

try this code to see the difference 

{code:title=TrySuccessFailure.scala|borderStyle=solid}
import scala.util.{Try, Success, Failure}
import java.io._

object TrySuccessFailure extends App {

  badAdder(3) match {
case Success(i) => println(s"success, i = $i")
case Failure(t) =>
  // this works, but it's not too useful/readable
  println(t.getStackTrace.mkString("\n"))
  println("===")
  println(t.getStackTraceString)
  // this works much better
  val sw = new StringWriter
  t.printStackTrace(new PrintWriter(sw))
  println(sw.toString)
  }
  def badAdder(a: Int): Try[Int] = {
Try({
  val b = a + 1
  if (b == 3) b else {
val ioe = new IOException("Boom!")
throw new AlsException("Bummer!", ioe)
  }
})
  }
  class AlsException(s: String, e: Exception) extends Exception(s: String, e: 
Exception)
}
{code}


> Stop using RichException.getStackTrace it is deprecated
> ---
>
> Key: SPARK-13172
> URL: https://issues.apache.org/jira/browse/SPARK-13172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> Throwable getStackTrace is the recommended alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13054) Always post TaskEnd event for tasks in cancelled stages

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138220#comment-15138220
 ] 

Apache Spark commented on SPARK-13054:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/10958

> Always post TaskEnd event for tasks in cancelled stages
> ---
>
> Key: SPARK-13054
> URL: https://issues.apache.org/jira/browse/SPARK-13054
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> {code}
> // The success case is dealt with separately below.
> // TODO: Why post it only for failed tasks in cancelled stages? Clarify 
> semantics here.
> if (event.reason != Success) {
>   val attemptId = task.stageAttemptId
>   listenerBus.post(SparkListenerTaskEnd(
> stageId, attemptId, taskType, event.reason, event.taskInfo, 
> taskMetrics))
> }
> {code}
> Today we only post task end events for canceled stages if the task failed. 
> There is no reason why we shouldn't just post it for all the tasks, including 
> the ones that succeeded. If we do that we will be able to simplify another 
> branch in the DAGScheduler, which needs a lot of simplification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-13054) Always post TaskEnd event for tasks in cancelled stages

2016-02-08 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-13054:
--
Comment: was deleted

(was: User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/10958)

> Always post TaskEnd event for tasks in cancelled stages
> ---
>
> Key: SPARK-13054
> URL: https://issues.apache.org/jira/browse/SPARK-13054
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> {code}
> // The success case is dealt with separately below.
> // TODO: Why post it only for failed tasks in cancelled stages? Clarify 
> semantics here.
> if (event.reason != Success) {
>   val attemptId = task.stageAttemptId
>   listenerBus.post(SparkListenerTaskEnd(
> stageId, attemptId, taskType, event.reason, event.taskInfo, 
> taskMetrics))
> }
> {code}
> Today we only post task end events for canceled stages if the task failed. 
> There is no reason why we shouldn't just post it for all the tasks, including 
> the ones that succeeded. If we do that we will be able to simplify another 
> branch in the DAGScheduler, which needs a lot of simplification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated

2016-02-08 Thread sachin aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138423#comment-15138423
 ] 

sachin aggarwal commented on SPARK-13172:
-

there are two ways we can proceed :- first use printStackTrace and second to 
use mkString()

1) can be ecapsulated in function  :-
def getStackTraceAsString(t: Throwable) = {
val sw = new StringWriter
t.printStackTrace(new PrintWriter(sw))
sw.toString
}

2) println(t.getStackTrace.mkString("\n"))


mkstring approach give extractly same string as old function getStackTraceString

but the output of first approach is more readable.

try this code to see the difference 

```
import scala.util.{Try, Success, Failure}
import java.io._

object TrySuccessFailure extends App {

  badAdder(3) match {
case Success(i) => println(s"success, i = $i")
case Failure(t) =>
  // this works, but it's not too useful/readable
  println(t.getStackTrace.mkString("\n"))
  println("===")
  println(t.getStackTraceString)
  // this works much better
  val sw = new StringWriter
  t.printStackTrace(new PrintWriter(sw))
  println(sw.toString)
  }
  def badAdder(a: Int): Try[Int] = {
Try({
  val b = a + 1
  if (b == 3) b else {
val ioe = new IOException("Boom!")
throw new AlsException("Bummer!", ioe)
  }
})
  }
  class AlsException(s: String, e: Exception) extends Exception(s: String, e: 
Exception)
}
```


> Stop using RichException.getStackTrace it is deprecated
> ---
>
> Key: SPARK-13172
> URL: https://issues.apache.org/jira/browse/SPARK-13172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> Throwable getStackTrace is the recommended alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13232) YARN executor node label expressions

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13232:


Assignee: Apache Spark

> YARN executor node label expressions
> 
>
> Key: SPARK-13232
> URL: https://issues.apache.org/jira/browse/SPARK-13232
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
> Environment: Scala 2.11.7,  Hadoop 2.7.2, Spark 1.6.0
>Reporter: Atkins
>Assignee: Apache Spark
>Priority: Minor
>
> Using node label expression for executor failed to request container request 
> and throws *InvalidContainerRequestException*.
> The code
> {code:title=AMRMClientImpl.java}
>   /**
>* Valid if a node label expression specified on container request is valid 
> or
>* not
>* 
>* @param containerRequest
>*/
>   private void checkNodeLabelExpression(T containerRequest) {
> String exp = containerRequest.getNodeLabelExpression();
> 
> if (null == exp || exp.isEmpty()) {
>   return;
> }
> // Don't support specifying >= 2 node labels in a node label expression 
> now
> if (exp.contains("&&") || exp.contains("||")) {
>   throw new InvalidContainerRequestException(
>   "Cannot specify more than two node labels"
>   + " in a single node label expression");
> }
> 
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
>   }
> {code}
> doesn't allow node label with rack and node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13239) Click-Through Rate Prediction

2016-02-08 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-13239:
---

 Summary: Click-Through Rate Prediction
 Key: SPARK-13239
 URL: https://issues.apache.org/jira/browse/SPARK-13239
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Yu Ishikawa
Priority: Minor


Apply ML Pipeline API to Click-Through Rate Prediction
https://www.kaggle.com/c/avazu-ctr-prediction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13239) Click-Through Rate Prediction

2016-02-08 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138490#comment-15138490
 ] 

Yu Ishikawa commented on SPARK-13239:
-

I'm working on this issue.

> Click-Through Rate Prediction
> -
>
> Key: SPARK-13239
> URL: https://issues.apache.org/jira/browse/SPARK-13239
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Yu Ishikawa
>Priority: Minor
>
> Apply ML Pipeline API to Click-Through Rate Prediction
> https://www.kaggle.com/c/avazu-ctr-prediction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13015) Replace example code in mllib-data-types.md using include_example

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138307#comment-15138307
 ] 

Apache Spark commented on SPARK-13015:
--

User 'keypointt' has created a pull request for this issue:
https://github.com/apache/spark/pull/11128

> Replace example code in mllib-data-types.md using include_example
> -
>
> Key: SPARK-13015
> URL: https://issues.apache.org/jira/browse/SPARK-13015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> See examples in other finished sub-JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12329) spark-sql prints out SET commands to stdout instead of stderr

2016-02-08 Thread Rishabh Bhardwaj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138340#comment-15138340
 ] 

Rishabh Bhardwaj commented on SPARK-12329:
--

[~ashwinshankar77] Are you working on this or can I take this up?

> spark-sql prints out SET commands to stdout instead of stderr
> -
>
> Key: SPARK-12329
> URL: https://issues.apache.org/jira/browse/SPARK-12329
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Ashwin Shankar
>Priority: Minor
>
> When I run "$spark-sql -f ", I see that few "SET key value" messages 
> get printed on stdout instead of stderr. These messages should go to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12329) spark-sql prints out SET commands to stdout instead of stderr

2016-02-08 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138381#comment-15138381
 ] 

Ashwin Shankar commented on SPARK-12329:


Yep, feel free to take it. 

> spark-sql prints out SET commands to stdout instead of stderr
> -
>
> Key: SPARK-12329
> URL: https://issues.apache.org/jira/browse/SPARK-12329
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Ashwin Shankar
>Priority: Minor
>
> When I run "$spark-sql -f ", I see that few "SET key value" messages 
> get printed on stdout instead of stderr. These messages should go to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-12357) Implement unhandledFilter interface for JDBC

2016-02-08 Thread Luciano Resende (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luciano Resende closed SPARK-12357.
---
Resolution: Duplicate

> Implement unhandledFilter interface for JDBC
> 
>
> Key: SPARK-12357
> URL: https://issues.apache.org/jira/browse/SPARK-12357
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hyukjin Kwon
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13238) Add ganglia dmax parameter

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13238:


Assignee: (was: Apache Spark)

> Add ganglia dmax parameter
> --
>
> Key: SPARK-13238
> URL: https://issues.apache.org/jira/browse/SPARK-13238
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Ekasit Kijsipongse
>Priority: Minor
>
> The current ganglia reporter doesn't set metric expiration time (dmax). The 
> metrics of all finished applications are indefinitely left displayed in 
> ganglia web. The dmax parameter allows user to set the lifetime of the 
> metrics. The default value is 0 for compatibility with previous versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13238) Add ganglia dmax parameter

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138275#comment-15138275
 ] 

Apache Spark commented on SPARK-13238:
--

User 'ekasitk' has created a pull request for this issue:
https://github.com/apache/spark/pull/11127

> Add ganglia dmax parameter
> --
>
> Key: SPARK-13238
> URL: https://issues.apache.org/jira/browse/SPARK-13238
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Ekasit Kijsipongse
>Priority: Minor
>
> The current ganglia reporter doesn't set metric expiration time (dmax). The 
> metrics of all finished applications are indefinitely left displayed in 
> ganglia web. The dmax parameter allows user to set the lifetime of the 
> metrics. The default value is 0 for compatibility with previous versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13238) Add ganglia dmax parameter

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13238:


Assignee: Apache Spark

> Add ganglia dmax parameter
> --
>
> Key: SPARK-13238
> URL: https://issues.apache.org/jira/browse/SPARK-13238
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Ekasit Kijsipongse
>Assignee: Apache Spark
>Priority: Minor
>
> The current ganglia reporter doesn't set metric expiration time (dmax). The 
> metrics of all finished applications are indefinitely left displayed in 
> ganglia web. The dmax parameter allows user to set the lifetime of the 
> metrics. The default value is 0 for compatibility with previous versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138297#comment-15138297
 ] 

Xiao Li edited comment on SPARK-13219 at 2/9/16 3:57 AM:
-

It should work without Join clauses, as long as these predicates are pushed 
into the join/filter conditions.

However, the current PR is unable to infer the conditions when needing multiple 
hops. Actually, I am glad to work on it, if needed. 

Let me at [~rxin] [~marmbrus]. 


was (Author: smilegator):
It should work without Join clauses, as long as these predicates are pushed 
into the join conditions. If they are only in filters, this is not applicable 
and we should not pushed down the inferred conditions. 

However, the current PR is unable to infer the conditions when needing multiple 
hops. Actually, I am glad to work on it, if needed. 

Let me at [~rxin] [~marmbrus]. 

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13015) Replace example code in mllib-data-types.md using include_example

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13015:


Assignee: (was: Apache Spark)

> Replace example code in mllib-data-types.md using include_example
> -
>
> Key: SPARK-13015
> URL: https://issues.apache.org/jira/browse/SPARK-13015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> See examples in other finished sub-JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13015) Replace example code in mllib-data-types.md using include_example

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13015:


Assignee: Apache Spark

> Replace example code in mllib-data-types.md using include_example
> -
>
> Key: SPARK-13015
> URL: https://issues.apache.org/jira/browse/SPARK-13015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Xusen Yin
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
>
> See examples in other finished sub-JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated

2016-02-08 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138447#comment-15138447
 ] 

Jakob Odersky commented on SPARK-13172:
---

Cool, thanks for the snippet! I agree, the first approach looks alot better

> Stop using RichException.getStackTrace it is deprecated
> ---
>
> Key: SPARK-13172
> URL: https://issues.apache.org/jira/browse/SPARK-13172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> Throwable getStackTrace is the recommended alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13237) Generate broadcast outer join

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13237:


Assignee: Apache Spark  (was: Davies Liu)

> Generate broadcast outer join
> -
>
> Key: SPARK-13237
> URL: https://issues.apache.org/jira/browse/SPARK-13237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13237) Generate broadcast outer join

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13237:


Assignee: Davies Liu  (was: Apache Spark)

> Generate broadcast outer join
> -
>
> Key: SPARK-13237
> URL: https://issues.apache.org/jira/browse/SPARK-13237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13237) Generate broadcast outer join

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138488#comment-15138488
 ] 

Apache Spark commented on SPARK-13237:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/11130

> Generate broadcast outer join
> -
>
> Key: SPARK-13237
> URL: https://issues.apache.org/jira/browse/SPARK-13237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-12921) Use SparkHadoopUtil reflection to access TaskAttemptContext in SpecificParquetRecordReaderBase

2016-02-08 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reopened SPARK-12921:


Looks like there's still one more spot that I missed, so I'm going to reopen 
this and will submit a one-line fixup patch.

> Use SparkHadoopUtil reflection to access TaskAttemptContext in 
> SpecificParquetRecordReaderBase
> --
>
> Key: SPARK-12921
> URL: https://issues.apache.org/jira/browse/SPARK-12921
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.6.1
>
>
> It looks like there's one place left in the codebase, 
> SpecificParquetRecordReaderBase,  where we didn't use SparkHadoopUtil's 
> reflective accesses of TaskAttemptContext methods, creating problems when 
> using a single Spark artifact with both Hadoop 1.x and 2.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138289#comment-15138289
 ] 

Xiao Li edited comment on SPARK-13219 at 2/9/16 3:42 AM:
-

Could you wait for me to fix another issue? When I tried your query, I found a 
bug in the latest build. I need to fix it now. Thanks!


was (Author: smilegator):
Could you wait for me to fix another issue? When I tried your query, I found a 
bug. I need to fix it now. Thanks!

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138289#comment-15138289
 ] 

Xiao Li commented on SPARK-13219:
-

Could you wait for me to fix another issue? When I tried your query, I found a 
bug. I need to fix it now. Thanks!

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138297#comment-15138297
 ] 

Xiao Li edited comment on SPARK-13219 at 2/9/16 3:51 AM:
-

It should work without Join clauses, as long as these predicates are pushed 
into the join conditions. If they are only in filters, this is not applicable 
and we should not pushed down the inferred conditions. 

However, the current PR is unable to infer the conditions when needing multiple 
hops. Actually, I am glad to work on it, if needed. 

Let me at [~rxin] [~marmbrus]. 


was (Author: smilegator):
It should works without Join clauses, as long as these predicates are pushed 
into the join conditions. If they are only in filters, this is not applicable 
and we should not pushed down the inferred conditions. 

However, the current PR is unable to infer the conditions when needing multiple 
hops. Actually, I am glad to work on it, if needed. 

Let me at [~rxin] [~marmbrus]. 

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13232) YARN executor node label expressions

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13232:


Assignee: (was: Apache Spark)

> YARN executor node label expressions
> 
>
> Key: SPARK-13232
> URL: https://issues.apache.org/jira/browse/SPARK-13232
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
> Environment: Scala 2.11.7,  Hadoop 2.7.2, Spark 1.6.0
>Reporter: Atkins
>Priority: Minor
>
> Using node label expression for executor failed to request container request 
> and throws *InvalidContainerRequestException*.
> The code
> {code:title=AMRMClientImpl.java}
>   /**
>* Valid if a node label expression specified on container request is valid 
> or
>* not
>* 
>* @param containerRequest
>*/
>   private void checkNodeLabelExpression(T containerRequest) {
> String exp = containerRequest.getNodeLabelExpression();
> 
> if (null == exp || exp.isEmpty()) {
>   return;
> }
> // Don't support specifying >= 2 node labels in a node label expression 
> now
> if (exp.contains("&&") || exp.contains("||")) {
>   throw new InvalidContainerRequestException(
>   "Cannot specify more than two node labels"
>   + " in a single node label expression");
> }
> 
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
>   }
> {code}
> doesn't allow node label with rack and node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13232) YARN executor node label expressions

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138438#comment-15138438
 ] 

Apache Spark commented on SPARK-13232:
--

User 'AtkinsChang' has created a pull request for this issue:
https://github.com/apache/spark/pull/11129

> YARN executor node label expressions
> 
>
> Key: SPARK-13232
> URL: https://issues.apache.org/jira/browse/SPARK-13232
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
> Environment: Scala 2.11.7,  Hadoop 2.7.2, Spark 1.6.0
>Reporter: Atkins
>Priority: Minor
>
> Using node label expression for executor failed to request container request 
> and throws *InvalidContainerRequestException*.
> The code
> {code:title=AMRMClientImpl.java}
>   /**
>* Valid if a node label expression specified on container request is valid 
> or
>* not
>* 
>* @param containerRequest
>*/
>   private void checkNodeLabelExpression(T containerRequest) {
> String exp = containerRequest.getNodeLabelExpression();
> 
> if (null == exp || exp.isEmpty()) {
>   return;
> }
> // Don't support specifying >= 2 node labels in a node label expression 
> now
> if (exp.contains("&&") || exp.contains("||")) {
>   throw new InvalidContainerRequestException(
>   "Cannot specify more than two node labels"
>   + " in a single node label expression");
> }
> 
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
>   }
> {code}
> doesn't allow node label with rack and node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13013) Replace example code in mllib-clustering.md using include_example

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137004#comment-15137004
 ] 

Apache Spark commented on SPARK-13013:
--

User 'keypointt' has created a pull request for this issue:
https://github.com/apache/spark/pull/6

> Replace example code in mllib-clustering.md using include_example
> -
>
> Key: SPARK-13013
> URL: https://issues.apache.org/jira/browse/SPARK-13013
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> See examples in other finished sub-JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13232) YARN executor node label expressions bug

2016-02-08 Thread Atkins (JIRA)
Atkins created SPARK-13232:
--

 Summary: YARN executor node label expressions bug
 Key: SPARK-13232
 URL: https://issues.apache.org/jira/browse/SPARK-13232
 Project: Spark
  Issue Type: Bug
  Components: YARN
 Environment: Scala 2.11.7,  Hadoop 2.7.2, Spark 1.6.0
Reporter: Atkins


Using node label expression for executor failed to request container request 
and throws *InvalidContainerRequestException*.
The code
{code:title=AMRMClientImpl.java}
  /**
   * Valid if a node label expression specified on container request is valid or
   * not
   * 
   * @param containerRequest
   */
  private void checkNodeLabelExpression(T containerRequest) {
String exp = containerRequest.getNodeLabelExpression();

if (null == exp || exp.isEmpty()) {
  return;
}

// Don't support specifying >= 2 node labels in a node label expression now
if (exp.contains("&&") || exp.contains("||")) {
  throw new InvalidContainerRequestException(
  "Cannot specify more than two node labels"
  + " in a single node label expression");
}

// Don't allow specify node label against ANY request
if ((containerRequest.getRacks() != null && 
(!containerRequest.getRacks().isEmpty()))
|| 
(containerRequest.getNodes() != null && 
(!containerRequest.getNodes().isEmpty( {
  throw new InvalidContainerRequestException(
  "Cannot specify node label with rack and node");
}
  }
{code}
doesn't allow node label with rack and node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13231) Rename Accumulable.countFailedValues to Accumulable.includeValuesOfFailedTasks and make it a user facing API.

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13231:


Assignee: (was: Apache Spark)

> Rename Accumulable.countFailedValues to 
> Accumulable.includeValuesOfFailedTasks and make it a user facing API.
> -
>
> Key: SPARK-13231
> URL: https://issues.apache.org/jira/browse/SPARK-13231
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Prashant Sharma
>Priority: Minor
>
> Rename Accumulable.countFailedValues to 
> Accumulable.includeValuesOfFailedTasks (or includeFailedTasks) I liked the 
> longer version though. 
> Exposing it to user has no disadvantage I can think of, but it can be useful 
> for them. One scenario can be a user defined metric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13231) Rename Accumulable.countFailedValues to Accumulable.includeValuesOfFailedTasks and make it a user facing API.

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13231:


Assignee: Apache Spark

> Rename Accumulable.countFailedValues to 
> Accumulable.includeValuesOfFailedTasks and make it a user facing API.
> -
>
> Key: SPARK-13231
> URL: https://issues.apache.org/jira/browse/SPARK-13231
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Prashant Sharma
>Assignee: Apache Spark
>Priority: Minor
>
> Rename Accumulable.countFailedValues to 
> Accumulable.includeValuesOfFailedTasks (or includeFailedTasks) I liked the 
> longer version though. 
> Exposing it to user has no disadvantage I can think of, but it can be useful 
> for them. One scenario can be a user defined metric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13231) Rename Accumulable.countFailedValues to Accumulable.includeValuesOfFailedTasks and make it a user facing API.

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136944#comment-15136944
 ] 

Apache Spark commented on SPARK-13231:
--

User 'ScrapCodes' has created a pull request for this issue:
https://github.com/apache/spark/pull/5

> Rename Accumulable.countFailedValues to 
> Accumulable.includeValuesOfFailedTasks and make it a user facing API.
> -
>
> Key: SPARK-13231
> URL: https://issues.apache.org/jira/browse/SPARK-13231
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Prashant Sharma
>Priority: Minor
>
> Rename Accumulable.countFailedValues to 
> Accumulable.includeValuesOfFailedTasks (or includeFailedTasks) I liked the 
> longer version though. 
> Exposing it to user has no disadvantage I can think of, but it can be useful 
> for them. One scenario can be a user defined metric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13013) Replace example code in mllib-clustering.md using include_example

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13013:


Assignee: Apache Spark

> Replace example code in mllib-clustering.md using include_example
> -
>
> Key: SPARK-13013
> URL: https://issues.apache.org/jira/browse/SPARK-13013
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Xusen Yin
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
>
> See examples in other finished sub-JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13013) Replace example code in mllib-clustering.md using include_example

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13013:


Assignee: (was: Apache Spark)

> Replace example code in mllib-clustering.md using include_example
> -
>
> Key: SPARK-13013
> URL: https://issues.apache.org/jira/browse/SPARK-13013
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> See examples in other finished sub-JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.

2016-02-08 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137000#comment-15137000
 ] 

Thomas Graves commented on SPARK-12316:
---

you say "endless cycle call" do you mean the application master hangs?  It 
seems like it should throw and if the application is done it should just exit 
anyway since the AM is just calling stop on it.I just want to clarify what 
is happening because I assume even if you wait a minute you could still hit the 
same condition once when its tearing down.

> Stack overflow with endless call of `Delegation token thread` when 
> application end.
> ---
>
> Key: SPARK-12316
> URL: https://issues.apache.org/jira/browse/SPARK-12316
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Attachments: 20151210045149.jpg, 20151210045533.jpg
>
>
> When application end, AM will clean the staging dir.
> But if the driver trigger to update the delegation token, it will can't find 
> the right token file and then it will endless cycle call the method 
> 'updateCredentialsIfRequired'.
> Then it lead to StackOverflowError.
> !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg!
> !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13014) Replace example code in mllib-collaborative-filtering.md using include_example

2016-02-08 Thread Xin Ren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137007#comment-15137007
 ] 

Xin Ren commented on SPARK-13014:
-

I'm working on this one, thanks :)

> Replace example code in mllib-collaborative-filtering.md using include_example
> --
>
> Key: SPARK-13014
> URL: https://issues.apache.org/jira/browse/SPARK-13014
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> See examples in other finished sub-JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-02-08 Thread Rama Mullapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137029#comment-15137029
 ] 

Rama Mullapudi commented on SPARK-12177:


Does the update include kerberos support, since 0.9 producers and consumers now 
support kerberos (SASL) and ssl. 

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13232) YARN executor node label expressions

2016-02-08 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13232:
--
  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)
   Summary: YARN executor node label expressions  (was: YARN executor node 
label expressions bug)

What are you specifically referring to in this code -- what change are you 
proposing?

As far as I can tell you're referring to something that's just not supported 
yet, which are conjunctions?

> YARN executor node label expressions
> 
>
> Key: SPARK-13232
> URL: https://issues.apache.org/jira/browse/SPARK-13232
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
> Environment: Scala 2.11.7,  Hadoop 2.7.2, Spark 1.6.0
>Reporter: Atkins
>Priority: Minor
>
> Using node label expression for executor failed to request container request 
> and throws *InvalidContainerRequestException*.
> The code
> {code:title=AMRMClientImpl.java}
>   /**
>* Valid if a node label expression specified on container request is valid 
> or
>* not
>* 
>* @param containerRequest
>*/
>   private void checkNodeLabelExpression(T containerRequest) {
> String exp = containerRequest.getNodeLabelExpression();
> 
> if (null == exp || exp.isEmpty()) {
>   return;
> }
> // Don't support specifying >= 2 node labels in a node label expression 
> now
> if (exp.contains("&&") || exp.contains("||")) {
>   throw new InvalidContainerRequestException(
>   "Cannot specify more than two node labels"
>   + " in a single node label expression");
> }
> 
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
>   }
> {code}
> doesn't allow node label with rack and node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.

2016-02-08 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-12316:
--
Assignee: SaintBacchus

> Stack overflow with endless call of `Delegation token thread` when 
> application end.
> ---
>
> Key: SPARK-12316
> URL: https://issues.apache.org/jira/browse/SPARK-12316
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Attachments: 20151210045149.jpg, 20151210045533.jpg
>
>
> When application end, AM will clean the staging dir.
> But if the driver trigger to update the delegation token, it will can't find 
> the right token file and then it will endless cycle call the method 
> 'updateCredentialsIfRequired'.
> Then it lead to StackOverflowError.
> !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg!
> !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-13219:

Component/s: (was: Spark Core)
 SQL

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7889) Jobs progress of apps on complete page of HistoryServer shows uncompleted

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137336#comment-15137336
 ] 

Apache Spark commented on SPARK-7889:
-

User 'squito' has created a pull request for this issue:
https://github.com/apache/spark/pull/8

> Jobs progress of apps on complete page of HistoryServer shows uncompleted
> -
>
> Key: SPARK-7889
> URL: https://issues.apache.org/jira/browse/SPARK-7889
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: meiyoula
>Priority: Minor
>
> When running a SparkPi with 2000 tasks, cliking into the app on incomplete 
> page, the job progress shows 400/2000. After the app is completed, the app 
> goes to complete page from incomplete, and now cliking into the app, the  job 
> progress still shows 400/2000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137333#comment-15137333
 ] 

Xiao Li commented on SPARK-13219:
-

Welcome

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark

2016-02-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137369#comment-15137369
 ] 

Sean Owen commented on SPARK-6305:
--

I've started working on this, and it's as awful a dependency mess as you'd 
imagine.

> Add support for log4j 2.x to Spark
> --
>
> Key: SPARK-6305
> URL: https://issues.apache.org/jira/browse/SPARK-6305
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Tal Sliwowicz
>Priority: Minor
>
> log4j 2 requires replacing the slf4j binding and adding the log4j jars in the 
> classpath. Since there are shaded jars, it must be done during the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Abhinav Chawade (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137330#comment-15137330
 ] 

Abhinav Chawade commented on SPARK-13219:
-

Thanks Xiao. I will pull in the request and see how it performs.

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137282#comment-15137282
 ] 

Xiao Li commented on SPARK-13219:
-

See this PR: https://github.com/apache/spark/pull/10490. 

Let me know if you hit any bug. Thanks!

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0

2016-02-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137320#comment-15137320
 ] 

Devaraj K commented on SPARK-13117:
---

Thanks [~jjordan] for reporting. I would like to provide PR if you are not 
planning to work on this. Please let me know, Thanks.

> WebUI should use the local ip not 0.0.0.0
> -
>
> Key: SPARK-13117
> URL: https://issues.apache.org/jira/browse/SPARK-13117
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: Jeremiah Jordan
>
> When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP 
> except the WebUI.  The WebUI should use the SPARK_LOCAL_IP not always use 
> 0.0.0.0
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12455) Add ExpressionDescription to window functions

2016-02-08 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-12455.
---
   Resolution: Resolved
Fix Version/s: 2.0.0

> Add ExpressionDescription to window functions
> -
>
> Key: SPARK-12455
> URL: https://issues.apache.org/jira/browse/SPARK-12455
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Herman van Hovell
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0

2016-02-08 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137353#comment-15137353
 ] 

Jeremiah Jordan commented on SPARK-13117:
-

go for it.

> WebUI should use the local ip not 0.0.0.0
> -
>
> Key: SPARK-13117
> URL: https://issues.apache.org/jira/browse/SPARK-13117
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: Jeremiah Jordan
>
> When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP 
> except the WebUI.  The WebUI should use the SPARK_LOCAL_IP not always use 
> 0.0.0.0
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13016) Replace example code in mllib-dimensionality-reduction.md using include_example

2016-02-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137310#comment-15137310
 ] 

Devaraj K commented on SPARK-13016:
---

I am working on this, I will provide PR for this. Thanks

> Replace example code in mllib-dimensionality-reduction.md using 
> include_example
> ---
>
> Key: SPARK-13016
> URL: https://issues.apache.org/jira/browse/SPARK-13016
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> See examples in other finished sub-JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13235) Remove extra Distinct in Union Distinct

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137682#comment-15137682
 ] 

Apache Spark commented on SPARK-13235:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/11120

> Remove extra Distinct in Union Distinct
> ---
>
> Key: SPARK-13235
> URL: https://issues.apache.org/jira/browse/SPARK-13235
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Union Distinct has two Distinct that generates two Aggregation in the plan.
> {code}
> sql("select * from t0 union select * from t0").explain(true)
> {code}
> {code}
> == Parsed Logical Plan ==
> 'Project [unresolvedalias(*,None)]
> +- 'Subquery u_2
>+- 'Distinct
>   +- 'Project [unresolvedalias(*,None)]
>  +- 'Subquery u_1
> +- 'Distinct
>+- 'Union
>   :- 'Project [unresolvedalias(*,None)]
>   :  +- 'UnresolvedRelation `t0`, None
>   +- 'Project [unresolvedalias(*,None)]
>  +- 'UnresolvedRelation `t0`, None
> == Analyzed Logical Plan ==
> id: bigint
> Project [id#16L]
> +- Subquery u_2
>+- Distinct
>   +- Project [id#16L]
>  +- Subquery u_1
> +- Distinct
>+- Union
>   :- Project [id#16L]
>   :  +- Subquery t0
>   : +- Relation[id#16L] ParquetRelation
>   +- Project [id#16L]
>  +- Subquery t0
> +- Relation[id#16L] ParquetRelation
> == Optimized Logical Plan ==
> Aggregate [id#16L], [id#16L]
> +- Aggregate [id#16L], [id#16L]
>+- Union
>   :- Project [id#16L]
>   :  +- Relation[id#16L] ParquetRelation
>   +- Project [id#16L]
>  +- Relation[id#16L] ParquetRelation
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13235) Remove extra Distinct in Union Distinct

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13235:


Assignee: Apache Spark

> Remove extra Distinct in Union Distinct
> ---
>
> Key: SPARK-13235
> URL: https://issues.apache.org/jira/browse/SPARK-13235
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> Union Distinct has two Distinct that generates two Aggregation in the plan.
> {code}
> sql("select * from t0 union select * from t0").explain(true)
> {code}
> {code}
> == Parsed Logical Plan ==
> 'Project [unresolvedalias(*,None)]
> +- 'Subquery u_2
>+- 'Distinct
>   +- 'Project [unresolvedalias(*,None)]
>  +- 'Subquery u_1
> +- 'Distinct
>+- 'Union
>   :- 'Project [unresolvedalias(*,None)]
>   :  +- 'UnresolvedRelation `t0`, None
>   +- 'Project [unresolvedalias(*,None)]
>  +- 'UnresolvedRelation `t0`, None
> == Analyzed Logical Plan ==
> id: bigint
> Project [id#16L]
> +- Subquery u_2
>+- Distinct
>   +- Project [id#16L]
>  +- Subquery u_1
> +- Distinct
>+- Union
>   :- Project [id#16L]
>   :  +- Subquery t0
>   : +- Relation[id#16L] ParquetRelation
>   +- Project [id#16L]
>  +- Subquery t0
> +- Relation[id#16L] ParquetRelation
> == Optimized Logical Plan ==
> Aggregate [id#16L], [id#16L]
> +- Aggregate [id#16L], [id#16L]
>+- Union
>   :- Project [id#16L]
>   :  +- Relation[id#16L] ParquetRelation
>   +- Project [id#16L]
>  +- Relation[id#16L] ParquetRelation
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13235) Remove extra Distinct in Union

2016-02-08 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-13235:

Summary: Remove extra Distinct in Union  (was: Remove extra Distinct in 
Union Distinct)

> Remove extra Distinct in Union
> --
>
> Key: SPARK-13235
> URL: https://issues.apache.org/jira/browse/SPARK-13235
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Union Distinct has two Distinct that generates two Aggregation in the plan.
> {code}
> sql("select * from t0 union select * from t0").explain(true)
> {code}
> {code}
> == Parsed Logical Plan ==
> 'Project [unresolvedalias(*,None)]
> +- 'Subquery u_2
>+- 'Distinct
>   +- 'Project [unresolvedalias(*,None)]
>  +- 'Subquery u_1
> +- 'Distinct
>+- 'Union
>   :- 'Project [unresolvedalias(*,None)]
>   :  +- 'UnresolvedRelation `t0`, None
>   +- 'Project [unresolvedalias(*,None)]
>  +- 'UnresolvedRelation `t0`, None
> == Analyzed Logical Plan ==
> id: bigint
> Project [id#16L]
> +- Subquery u_2
>+- Distinct
>   +- Project [id#16L]
>  +- Subquery u_1
> +- Distinct
>+- Union
>   :- Project [id#16L]
>   :  +- Subquery t0
>   : +- Relation[id#16L] ParquetRelation
>   +- Project [id#16L]
>  +- Subquery t0
> +- Relation[id#16L] ParquetRelation
> == Optimized Logical Plan ==
> Aggregate [id#16L], [id#16L]
> +- Aggregate [id#16L], [id#16L]
>+- Union
>   :- Project [id#16L]
>   :  +- Relation[id#16L] ParquetRelation
>   +- Project [id#16L]
>  +- Relation[id#16L] ParquetRelation
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13235) Remove extra Distinct in Union Distinct

2016-02-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13235:


Assignee: (was: Apache Spark)

> Remove extra Distinct in Union Distinct
> ---
>
> Key: SPARK-13235
> URL: https://issues.apache.org/jira/browse/SPARK-13235
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Union Distinct has two Distinct that generates two Aggregation in the plan.
> {code}
> sql("select * from t0 union select * from t0").explain(true)
> {code}
> {code}
> == Parsed Logical Plan ==
> 'Project [unresolvedalias(*,None)]
> +- 'Subquery u_2
>+- 'Distinct
>   +- 'Project [unresolvedalias(*,None)]
>  +- 'Subquery u_1
> +- 'Distinct
>+- 'Union
>   :- 'Project [unresolvedalias(*,None)]
>   :  +- 'UnresolvedRelation `t0`, None
>   +- 'Project [unresolvedalias(*,None)]
>  +- 'UnresolvedRelation `t0`, None
> == Analyzed Logical Plan ==
> id: bigint
> Project [id#16L]
> +- Subquery u_2
>+- Distinct
>   +- Project [id#16L]
>  +- Subquery u_1
> +- Distinct
>+- Union
>   :- Project [id#16L]
>   :  +- Subquery t0
>   : +- Relation[id#16L] ParquetRelation
>   +- Project [id#16L]
>  +- Subquery t0
> +- Relation[id#16L] ParquetRelation
> == Optimized Logical Plan ==
> Aggregate [id#16L], [id#16L]
> +- Aggregate [id#16L], [id#16L]
>+- Union
>   :- Project [id#16L]
>   :  +- Relation[id#16L] ParquetRelation
>   +- Project [id#16L]
>  +- Relation[id#16L] ParquetRelation
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12505) Pushdown a Limit on top of an Outer-Join

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137710#comment-15137710
 ] 

Apache Spark commented on SPARK-12505:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/11121

> Pushdown a Limit on top of an Outer-Join
> 
>
> Key: SPARK-12505
> URL: https://issues.apache.org/jira/browse/SPARK-12505
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Xiao Li
>
> "Rule that applies to a Limit on top of an OUTER Join. The original Limit 
> won't go away after applying this rule, but additional Limit node(s) will be 
> created on top of the outer-side child (or children if it's a FULL OUTER 
> Join). "
> – from https://issues.apache.org/jira/browse/CALCITE-832



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12503) Pushdown a Limit on top of a Union

2016-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137709#comment-15137709
 ] 

Apache Spark commented on SPARK-12503:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/11121

> Pushdown a Limit on top of a Union
> --
>
> Key: SPARK-12503
> URL: https://issues.apache.org/jira/browse/SPARK-12503
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer, SQL
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Xiao Li
>
> "Rule that applies to a Limit on top of a Union. The original Limit won't go 
> away after applying this rule, but additional Limit nodes will be created on 
> top of each child of Union, so that these children produce less rows and 
> Limit can be further optimized for children Relations."
> -- from https://issues.apache.org/jira/browse/CALCITE-832
> Also, the same topic in Hive: https://issues.apache.org/jira/browse/HIVE-11775



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13219) Pushdown predicate propagation in SparkSQL with join

2016-02-08 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137721#comment-15137721
 ] 

Xiao Li commented on SPARK-13219:
-

Let me try your SQL query in Spark 1.6.1. 

> Pushdown predicate propagation in SparkSQL with join
> 
>
> Key: SPARK-13219
> URL: https://issues.apache.org/jira/browse/SPARK-13219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.6.0
> Environment: Spark 1.4
> Datastax Spark connector 1.4
> Cassandra. 2.1.12
> Centos 6.6
>Reporter: Abhinav Chawade
>
> When 2 or more tables are joined in SparkSQL and there is an equality clause 
> in query on attributes used to perform the join, it is useful to apply that 
> clause on scans for both table. If this is not done, one of the tables 
> results in full scan which can reduce the query dramatically. Consider 
> following example with 2 tables being joined.
> {code}
> CREATE TABLE assets (
> assetid int PRIMARY KEY,
> address text,
> propertyname text
> )
> CREATE TABLE tenants (
> assetid int PRIMARY KEY,
> name text
> )
> spark-sql> explain select t.name from tenants t, assets a where a.assetid = 
> t.assetid and t.assetid='1201';
> WARN  2016-02-05 23:05:19 org.apache.hadoop.util.NativeCodeLoader: Unable to 
> load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> == Physical Plan ==
> Project [name#14]
>  ShuffledHashJoin [assetid#13], [assetid#15], BuildRight
>   Exchange (HashPartitioning 200)
>Filter (CAST(assetid#13, DoubleType) = 1201.0)
> HiveTableScan [assetid#13,name#14], (MetastoreRelation element, tenants, 
> Some(t)), None
>   Exchange (HashPartitioning 200)
>HiveTableScan [assetid#15], (MetastoreRelation element, assets, Some(a)), 
> None
> Time taken: 1.354 seconds, Fetched 8 row(s)
> {code}
> The simple workaround is to add another equality condition for each table but 
> it becomes cumbersome. It will be helpful if the query planner could improve 
> filter propagation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2016-02-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10561:
---
Description: 
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:


A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.

  was:
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.


> Provide tooling for auto-generating Spark SQL reference manual
> --
>
> Key: SPARK-10561
> URL: https://issues.apache.org/jira/browse/SPARK-10561
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Ted Yu
>
> Here is the discussion thread:
> http://search-hadoop.com/m/q3RTtcD20F1o62xE
> Richard Hillegas made the following suggestion:
> A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
> to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
> support provided by the Scala language. I am new to programming in Scala, so 
> I don't know whether the Scala ecosystem provides any good tools for 
> reverse-engineering a BNF from a class which extends 
> scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13180) Protect against SessionState being null when accessing HiveClientImpl#conf

2016-02-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137674#comment-15137674
 ] 

Ted Yu commented on SPARK-13180:


I wonder if we should provide better error message when NPE happens - the cause 
may be mixed dependencies. See last response on the thread.

> Protect against SessionState being null when accessing HiveClientImpl#conf
> --
>
> Key: SPARK-13180
> URL: https://issues.apache.org/jira/browse/SPARK-13180
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-13180-util.patch
>
>
> See this thread http://search-hadoop.com/m/q3RTtFoTDi2HVCrM1
> {code}
> java.lang.NullPointerException
> at 
> org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205)
> at 
> org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552)
> at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551)
> at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:538)
> at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:537)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:537)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
> at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
> at org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:457)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:457)
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:456)
> at org.apache.spark.sql.hive.HiveContext$$anon$3.(HiveContext.scala:473)
> at 
> org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:473)
> at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:472)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:133)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
> at 
> org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442)
> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:223)
> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13236) SQL generation support for union

2016-02-08 Thread Xiao Li (JIRA)
Xiao Li created SPARK-13236:
---

 Summary: SQL generation support for union
 Key: SPARK-13236
 URL: https://issues.apache.org/jira/browse/SPARK-13236
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


checkHiveQl("SELECT * FROM t0 UNION SELECT * FROM t0")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >