[jira] [Commented] (SPARK-13634) Assigning spark context to variable results in serialization error

2016-03-07 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184597#comment-15184597
 ] 

Chris A. Mattmann commented on SPARK-13634:
---

Sean, thanks for your reply. We can agree to disagree on the semantics. I've 
been doing open source for a long time, and leaving JIRAs open for longer than 
43 minutes is not damaging by any means. As a former Spark mentor too during 
its Incubation and its Champion, I also disagree, and was involved in Spark 
from its early inception here at the ASF and so have not always seen this type 
of behavior, which is why it's troubling to me. Your comparison of one end of 
the spectrum (10) to 1000s in size of JIRAs and activity also kind of leaves a 
sour taste in my mouth. I know Spark gets lots of activity. So do many of the 
projects I've helped start and contribute to (Hadoop, Lucene/Solr, Nutch during 
its hey day, etc etc). I  left JIRAs open for longer than 43 mins in those 
projects as did many others wiser than me and that have been around a lot 
longer than me in open source. 

Thanks for taking time to think through what may be causing it. I'll choose to 
take the positive away from your reply and try to report back more on our 
workarounds in SciSpark and on our project.

--Chris

> Assigning spark context to variable results in serialization error
> --
>
> Key: SPARK-13634
> URL: https://issues.apache.org/jira/browse/SPARK-13634
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Reporter: Rahul Palamuttam
>Priority: Minor
>
> The following lines of code cause a task serialization error when executed in 
> the spark-shell. 
> Note that the error does not occur when submitting the code as a batch job - 
> via spark-submit.
> val temp = 10
> val newSC = sc
> val new RDD = newSC.parallelize(0 to 100).map(p => p + temp)
> For some reason when temp is being pulled in to the referencing environment 
> of the closure, so is the SparkContext. 
> We originally hit this issue in the SciSpark project, when referencing a 
> string variable inside of a lambda expression in RDD.map(...)
> Any insight into how this could be resolved would be appreciated.
> While the above code is trivial, SciSpark uses a wrapper around the 
> SparkContext to read from various file formats. We want to keep this class 
> structure and also use it in notebook and shell environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13734) SparkR histogram

2016-03-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13734:
--
Priority: Minor  (was: Major)

[~olarayej] this isn't sufficient for a JIRA. Please fill out title and 
description properly.

> SparkR histogram
> 
>
> Key: SPARK-13734
> URL: https://issues.apache.org/jira/browse/SPARK-13734
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13736) Big-Endian plataform issues

2016-03-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184583#comment-15184583
 ] 

Sean Owen commented on SPARK-13736:
---

Let's not make an umbrella, since it doesn't cover much, and doesn't actually 
cover other big-endian issues in the past. 
These are discrete JIRAs underneath.

> Big-Endian plataform issues
> ---
>
> Key: SPARK-13736
> URL: https://issues.apache.org/jira/browse/SPARK-13736
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Luciano Resende
>Priority: Critical
>
> We are starting to see few issues when building/testing on Big-Endian 
> platform. This serves as an umbrella jira to group all platform specific 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13634) Assigning spark context to variable results in serialization error

2016-03-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184563#comment-15184563
 ] 

Sean Owen commented on SPARK-13634:
---

JIRAs can be reopened, and should be if there's a change, like: you have a pull 
request to propose, or a different example or more analysis that suggests it's 
not just a Scala REPL thing. People can still comment on JIRAs too.

All else equal, a reply in 43 minutes is a good thing. While I can appreciate 
that, ideally, we'd always let the reporter explicitly confirm they're done or 
something, that's not feasible in this project. On average a JIRA is opened 
every _hour_, many of which never receive any follow-up. Leaving them open is 
damaging too, since people inevitably parse that as "legitimate issue I should 
work on or wait on". If I see a quite-likely answer, I'd rather reflect it in 
JIRA, and once in a while overturn it, since reopening is a normal lightweight 
operation that can be performed by the reporter.

Further, the reality is that about half of those JIRAs are not problems, badly 
described, poorly researched, etc (not this one), and actually _need_ rapid 
pushback with pointers to the contribution guide to discourage more of the 
behavior.

This is why some things get resolved fast in general, and it's with the intent 
of putting limited time to best use for the most people, and getting most 
people some quick feedback. I understand it's not how a project with 10 JIRAs a 
month probably operates, but I disagree that my reply was wrong or impolite.

Instead I'd certainly welcome materially more information and proposed change 
if you want to pursue and reopen this. For example, off the top of my head: 
does the ClosureCleaner specially treat {{sc}}? it may do so because there 
isn't supposed to be a second context in the application.

However if this is your real code, I strongly suspect you have a simple 
workaround in refactoring the third line into a function on an {{object}} (i.e. 
static). The layer of indirection, or something similar, likely avoids tripping 
on this. This is what I've suggested you pursue next. If that works, that's 
great info to paste here, at least as confirmation. Or if not, add it here 
anyway to show what else doesn't work.

> Assigning spark context to variable results in serialization error
> --
>
> Key: SPARK-13634
> URL: https://issues.apache.org/jira/browse/SPARK-13634
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Reporter: Rahul Palamuttam
>Priority: Minor
>
> The following lines of code cause a task serialization error when executed in 
> the spark-shell. 
> Note that the error does not occur when submitting the code as a batch job - 
> via spark-submit.
> val temp = 10
> val newSC = sc
> val new RDD = newSC.parallelize(0 to 100).map(p => p + temp)
> For some reason when temp is being pulled in to the referencing environment 
> of the closure, so is the SparkContext. 
> We originally hit this issue in the SciSpark project, when referencing a 
> string variable inside of a lambda expression in RDD.map(...)
> Any insight into how this could be resolved would be appreciated.
> While the above code is trivial, SciSpark uses a wrapper around the 
> SparkContext to read from various file formats. We want to keep this class 
> structure and also use it in notebook and shell environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13634) Assigning spark context to variable results in serialization error

2016-03-07 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184536#comment-15184536
 ] 

Chris A. Mattmann commented on SPARK-13634:
---

I'm CC'ed b/c I'm the PI of the SciSpark project and I asked Rahul to file this 
issue here. It's not a toy example - it's a real example from our system. We 
have a work around but were wondering if Apache Spark had thought of anything 
better or seen something similar. 

Our code is here: 
https://github.com/Scispark/scispark/

The question I was asking was related to etiquette. I don't think it's good 
etiquette to close tickets under which the reporter has weighed in. This was 
closed literally in 43 minutes, without even waiting for Rahul to chime back 
in. Is it really that urgent to close an issue that a user has reported that 
quickly without hearing back from them to see if your suggestion helped or 
answered their question?

> Assigning spark context to variable results in serialization error
> --
>
> Key: SPARK-13634
> URL: https://issues.apache.org/jira/browse/SPARK-13634
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Reporter: Rahul Palamuttam
>Priority: Minor
>
> The following lines of code cause a task serialization error when executed in 
> the spark-shell. 
> Note that the error does not occur when submitting the code as a batch job - 
> via spark-submit.
> val temp = 10
> val newSC = sc
> val new RDD = newSC.parallelize(0 to 100).map(p => p + temp)
> For some reason when temp is being pulled in to the referencing environment 
> of the closure, so is the SparkContext. 
> We originally hit this issue in the SciSpark project, when referencing a 
> string variable inside of a lambda expression in RDD.map(...)
> Any insight into how this could be resolved would be appreciated.
> While the above code is trivial, SciSpark uses a wrapper around the 
> SparkContext to read from various file formats. We want to keep this class 
> structure and also use it in notebook and shell environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13740) add null check for _verify_type in types.py

2016-03-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184532#comment-15184532
 ] 

Apache Spark commented on SPARK-13740:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/11574

> add null check for _verify_type in types.py
> ---
>
> Key: SPARK-13740
> URL: https://issues.apache.org/jira/browse/SPARK-13740
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13740) add null check for _verify_type in types.py

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13740:


Assignee: (was: Apache Spark)

> add null check for _verify_type in types.py
> ---
>
> Key: SPARK-13740
> URL: https://issues.apache.org/jira/browse/SPARK-13740
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13740) add null check for _verify_type in types.py

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13740:


Assignee: Apache Spark

> add null check for _verify_type in types.py
> ---
>
> Key: SPARK-13740
> URL: https://issues.apache.org/jira/browse/SPARK-13740
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13740) add null check for _verify_type in types.py

2016-03-07 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-13740:
---

 Summary: add null check for _verify_type in types.py
 Key: SPARK-13740
 URL: https://issues.apache.org/jira/browse/SPARK-13740
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13231) Make Accumulable.countFailedValues a user facing API.

2016-03-07 Thread Prashant Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-13231:

Description: Exposing it to user has no disadvantage I can think of, but it 
can be useful for them. One scenario can be a user defined metric. It also 
clarifies the fact that, by default we do not include values of failed tasks 
and behavior can be changed if a user is using it to account for some metrics.  
(was: Rename Accumulable.countFailedValues to 
Accumulable.includeValuesOfFailedTasks (or includeFailedTasks) I liked the 
longer version though. 

Exposing it to user has no disadvantage I can think of, but it can be useful 
for them. One scenario can be a user defined metric.)

> Make Accumulable.countFailedValues a user facing API.
> -
>
> Key: SPARK-13231
> URL: https://issues.apache.org/jira/browse/SPARK-13231
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Prashant Sharma
>Priority: Minor
>
> Exposing it to user has no disadvantage I can think of, but it can be useful 
> for them. One scenario can be a user defined metric. It also clarifies the 
> fact that, by default we do not include values of failed tasks and behavior 
> can be changed if a user is using it to account for some metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3200) Class defined with reference to external variables crashes in REPL.

2016-03-07 Thread Prashant Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184510#comment-15184510
 ] 

Prashant Sharma commented on SPARK-3200:


Yes, the issue SPARK-13634 is actually a duplicate. I have linked it as 
duplicate.

> Class defined with reference to external variables crashes in REPL.
> ---
>
> Key: SPARK-3200
> URL: https://issues.apache.org/jira/browse/SPARK-3200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>
> Reproducer:
> {noformat}
> val a = sc.textFile("README.md").count
> case class A(i: Int) { val j = a} 
> sc.parallelize(1 to 10).map(A(_)).collect()
> {noformat}
> This will happen only in distributed mode, when one refers something that 
> refers sc and not otherwise. 
> There are many ways to work around this, like directly assign a constant 
> value instead of referring the variable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13634) Assigning spark context to variable results in serialization error

2016-03-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184505#comment-15184505
 ] 

Sean Owen commented on SPARK-13634:
---

Chris, I resolved this as a duplicate, of an issue that's "WontFix". I'm not 
suggesting there is a resolution in Spark. The implicit workaround here is to 
not declare newSC of course. There may be others, and that may matter since I 
suspect this is just a toy example. Without seeing real code, I couldn't say 
more about other workarounds. I'm not sure why you were CCed, but what are you 
taking issue with?

> Assigning spark context to variable results in serialization error
> --
>
> Key: SPARK-13634
> URL: https://issues.apache.org/jira/browse/SPARK-13634
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Reporter: Rahul Palamuttam
>Priority: Minor
>
> The following lines of code cause a task serialization error when executed in 
> the spark-shell. 
> Note that the error does not occur when submitting the code as a batch job - 
> via spark-submit.
> val temp = 10
> val newSC = sc
> val new RDD = newSC.parallelize(0 to 100).map(p => p + temp)
> For some reason when temp is being pulled in to the referencing environment 
> of the closure, so is the SparkContext. 
> We originally hit this issue in the SciSpark project, when referencing a 
> string variable inside of a lambda expression in RDD.map(...)
> Any insight into how this could be resolved would be appreciated.
> While the above code is trivial, SciSpark uses a wrapper around the 
> SparkContext to read from various file formats. We want to keep this class 
> structure and also use it in notebook and shell environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5581) When writing sorted map output file, avoid open / close between each partition

2016-03-07 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184506#comment-15184506
 ] 

Josh Rosen commented on SPARK-5581:
---

[~sitalke...@gmail.com], I'm not working on this right now so feel free to 
submit a PR. Before you do, though, you might want to take a peek at 
https://github.com/apache/spark/pull/11498 to see whether the refactorings in 
that PR might make things easier. I'd also appreciate help on review of that 
PR, since it's kind of tricky and needs a bit more work.

> When writing sorted map output file, avoid open / close between each partition
> --
>
> Key: SPARK-5581
> URL: https://issues.apache.org/jira/browse/SPARK-5581
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 1.3.0
>Reporter: Sandy Ryza
>
> {code}
>   // Bypassing merge-sort; get an iterator by partition and just write 
> everything directly.
>   for ((id, elements) <- this.partitionedIterator) {
> if (elements.hasNext) {
>   val writer = blockManager.getDiskWriter(
> blockId, outputFile, ser, fileBufferSize, 
> context.taskMetrics.shuffleWriteMetrics.get)
>   for (elem <- elements) {
> writer.write(elem)
>   }
>   writer.commitAndClose()
>   val segment = writer.fileSegment()
>   lengths(id) = segment.length
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5581) When writing sorted map output file, avoid open / close between each partition

2016-03-07 Thread Sital Kedia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184491#comment-15184491
 ] 

Sital Kedia commented on SPARK-5581:


Is anyone working on this issue? I would like to work on it, if not. We are 
seeing very bad map side performance when the number of partitions is too large 
because of this issue. 

> When writing sorted map output file, avoid open / close between each partition
> --
>
> Key: SPARK-5581
> URL: https://issues.apache.org/jira/browse/SPARK-5581
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 1.3.0
>Reporter: Sandy Ryza
>
> {code}
>   // Bypassing merge-sort; get an iterator by partition and just write 
> everything directly.
>   for ((id, elements) <- this.partitionedIterator) {
> if (elements.hasNext) {
>   val writer = blockManager.getDiskWriter(
> blockId, outputFile, ser, fileBufferSize, 
> context.taskMetrics.shuffleWriteMetrics.get)
>   for (elem <- elements) {
> writer.write(elem)
>   }
>   writer.commitAndClose()
>   val segment = writer.fileSegment()
>   lengths(id) = segment.length
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3200) Class defined with reference to external variables crashes in REPL.

2016-03-07 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184486#comment-15184486
 ] 

Chris A. Mattmann commented on SPARK-3200:
--

Hi [~prashant_] thanks for your reply. If you look at the linked issue, someone 
linked the issue we were having  in 
https://issues.apache.org/jira/browse/SPARK-13634 as a duplicate of this, so 
just wanted to see if this was a fix for SPARK-13634.

> Class defined with reference to external variables crashes in REPL.
> ---
>
> Key: SPARK-3200
> URL: https://issues.apache.org/jira/browse/SPARK-3200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>
> Reproducer:
> {noformat}
> val a = sc.textFile("README.md").count
> case class A(i: Int) { val j = a} 
> sc.parallelize(1 to 10).map(A(_)).collect()
> {noformat}
> This will happen only in distributed mode, when one refers something that 
> refers sc and not otherwise. 
> There are many ways to work around this, like directly assign a constant 
> value instead of referring the variable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3200) Class defined with reference to external variables crashes in REPL.

2016-03-07 Thread Prashant Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184468#comment-15184468
 ] 

Prashant Sharma edited comment on SPARK-3200 at 3/8/16 6:06 AM:


Hi [~chrismattmann], It never worked. I have clarified above. But since no one 
apart from me ever ran into this and complexity of the fix was non trivial, it 
was "won't fix". 

Actually now that we use the scala repl "as is" without much modifications. So 
if it needs to be fixed, there is a considerably large amount of change than it 
was required back then. Along with the change the maintenance overhead will 
also be large.

However, if the fix is in high demand. One can go ahead and fix in the scala 
repl too. It is also possible to work around it. Did you ran into this issue ?

[EDIT] The patch proposed in the Jira can still be merged for scala 2.10 port 
of scala repl that lives in Spark. But then it should be done, if this fix is 
highly critical. 


was (Author: prashant_):
Hi [~chrismattmann], It never worked. I have clarified above. But since no one 
apart from me ever ran into this and complexity of the fix was non trivial, it 
was "won't fix". 

Actually now that we use the scala repl "as is" without much modifications. So 
if it needs to be fixed, there is a considerably large amount of change than it 
was required back then. Along with the change the maintenance overhead will 
also be large.

However, if the fix is in high demand. One can go ahead and fix in the scala 
repl too. It is also possible to work around it. Did you ran into this issue ?

> Class defined with reference to external variables crashes in REPL.
> ---
>
> Key: SPARK-3200
> URL: https://issues.apache.org/jira/browse/SPARK-3200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>
> Reproducer:
> {noformat}
> val a = sc.textFile("README.md").count
> case class A(i: Int) { val j = a} 
> sc.parallelize(1 to 10).map(A(_)).collect()
> {noformat}
> This will happen only in distributed mode, when one refers something that 
> refers sc and not otherwise. 
> There are many ways to work around this, like directly assign a constant 
> value instead of referring the variable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3200) Class defined with reference to external variables crashes in REPL.

2016-03-07 Thread Prashant Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma closed SPARK-3200.
--
Resolution: Won't Fix

> Class defined with reference to external variables crashes in REPL.
> ---
>
> Key: SPARK-3200
> URL: https://issues.apache.org/jira/browse/SPARK-3200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>
> Reproducer:
> {noformat}
> val a = sc.textFile("README.md").count
> case class A(i: Int) { val j = a} 
> sc.parallelize(1 to 10).map(A(_)).collect()
> {noformat}
> This will happen only in distributed mode, when one refers something that 
> refers sc and not otherwise. 
> There are many ways to work around this, like directly assign a constant 
> value instead of referring the variable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-3200) Class defined with reference to external variables crashes in REPL.

2016-03-07 Thread Prashant Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma reopened SPARK-3200:


I am reopening only to close it again with Won't Fix.

> Class defined with reference to external variables crashes in REPL.
> ---
>
> Key: SPARK-3200
> URL: https://issues.apache.org/jira/browse/SPARK-3200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>
> Reproducer:
> {noformat}
> val a = sc.textFile("README.md").count
> case class A(i: Int) { val j = a} 
> sc.parallelize(1 to 10).map(A(_)).collect()
> {noformat}
> This will happen only in distributed mode, when one refers something that 
> refers sc and not otherwise. 
> There are many ways to work around this, like directly assign a constant 
> value instead of referring the variable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3200) Class defined with reference to external variables crashes in REPL.

2016-03-07 Thread Prashant Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184468#comment-15184468
 ] 

Prashant Sharma commented on SPARK-3200:


Hi [~chrismattmann], It never worked. I have clarified above. But since no one 
apart from me ever ran into this and complexity of the fix was non trivial, it 
was "won't fix". 

Actually now that we use the scala repl "as is" without much modifications. So 
if it needs to be fixed, there is a considerably large amount of change than it 
was required back then. Along with the change the maintenance overhead will 
also be large.

However, if the fix is in high demand. One can go ahead and fix in the scala 
repl too. It is also possible to work around it. Did you ran into this issue ?

> Class defined with reference to external variables crashes in REPL.
> ---
>
> Key: SPARK-3200
> URL: https://issues.apache.org/jira/browse/SPARK-3200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>
> Reproducer:
> {noformat}
> val a = sc.textFile("README.md").count
> case class A(i: Int) { val j = a} 
> sc.parallelize(1 to 10).map(A(_)).collect()
> {noformat}
> This will happen only in distributed mode, when one refers something that 
> refers sc and not otherwise. 
> There are many ways to work around this, like directly assign a constant 
> value instead of referring the variable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13659) Remove returnValues from BlockStore APIs

2016-03-07 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13659.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Remove returnValues from BlockStore APIs
> 
>
> Key: SPARK-13659
> URL: https://issues.apache.org/jira/browse/SPARK-13659
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.0.0
>
>
> In preparation for larger refactorings, I think that we should remove the 
> confusing returnValues() option from the BlockStore put() APIs: returning the 
> value is only useful in one place (caching) and in other situations, such as 
> block replication, it's simpler to put() and then get().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3200) Class defined with reference to external variables crashes in REPL.

2016-03-07 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1518#comment-1518
 ] 

Chris A. Mattmann commented on SPARK-3200:
--

how did this get resolved other than simply saying in 1.6 it worked? I'm 
confused.

> Class defined with reference to external variables crashes in REPL.
> ---
>
> Key: SPARK-3200
> URL: https://issues.apache.org/jira/browse/SPARK-3200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>
> Reproducer:
> {noformat}
> val a = sc.textFile("README.md").count
> case class A(i: Int) { val j = a} 
> sc.parallelize(1 to 10).map(A(_)).collect()
> {noformat}
> This will happen only in distributed mode, when one refers something that 
> refers sc and not otherwise. 
> There are many ways to work around this, like directly assign a constant 
> value instead of referring the variable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13634) Assigning spark context to variable results in serialization error

2016-03-07 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184442#comment-15184442
 ] 

Chris A. Mattmann commented on SPARK-13634:
---

Hi [~srowen] it would have been nice to make sure this resolves [~Rahul 
Palamuttam]'s issue before closing it?
Isn't that simply good etiquette?

> Assigning spark context to variable results in serialization error
> --
>
> Key: SPARK-13634
> URL: https://issues.apache.org/jira/browse/SPARK-13634
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Reporter: Rahul Palamuttam
>Priority: Minor
>
> The following lines of code cause a task serialization error when executed in 
> the spark-shell. 
> Note that the error does not occur when submitting the code as a batch job - 
> via spark-submit.
> val temp = 10
> val newSC = sc
> val new RDD = newSC.parallelize(0 to 100).map(p => p + temp)
> For some reason when temp is being pulled in to the referencing environment 
> of the closure, so is the SparkContext. 
> We originally hit this issue in the SciSpark project, when referencing a 
> string variable inside of a lambda expression in RDD.map(...)
> Any insight into how this could be resolved would be appreciated.
> While the above code is trivial, SciSpark uses a wrapper around the 
> SparkContext to read from various file formats. We want to keep this class 
> structure and also use it in notebook and shell environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13231) Make Accumulable.countFailedValues a user facing API.

2016-03-07 Thread Prashant Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-13231:

Summary: Make Accumulable.countFailedValues a user facing API.  (was: 
Rename Accumulable.countFailedValues to Accumulable.includeValuesOfFailedTasks 
and make it a user facing API.)

> Make Accumulable.countFailedValues a user facing API.
> -
>
> Key: SPARK-13231
> URL: https://issues.apache.org/jira/browse/SPARK-13231
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Prashant Sharma
>Priority: Minor
>
> Rename Accumulable.countFailedValues to 
> Accumulable.includeValuesOfFailedTasks (or includeFailedTasks) I liked the 
> longer version though. 
> Exposing it to user has no disadvantage I can think of, but it can be useful 
> for them. One scenario can be a user defined metric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13711) Apache Spark driver stopping JVM when master not available

2016-03-07 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-13711.
--
   Resolution: Fixed
 Assignee: Shixiong Zhu
Fix Version/s: 2.0.0
   1.6.2

> Apache Spark driver stopping JVM when master not available 
> ---
>
> Key: SPARK-13711
> URL: https://issues.apache.org/jira/browse/SPARK-13711
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1, 1.6.0
>Reporter: Era
>Assignee: Shixiong Zhu
> Fix For: 1.6.2, 2.0.0
>
>
> In my application Java spark context is created with an unavailable master 
> URL (you may assume master is down for a maintenance). When creating Java 
> spark context it leads to stopping JVM that runs spark driver with JVM exit 
> code 50.
> When I checked the logs I found SparkUncaughtExceptionHandler calling the 
> System.exit. My program should run forever. 
> package test.mains;
> import org.apache.spark.SparkConf;
> import org.apache.spark.api.java.JavaSparkContext;
> public class CheckJavaSparkContext {
> public static void main(String[] args) {
> SparkConf conf = new SparkConf();
> conf.setAppName("test");
> conf.setMaster("spark://sunshinee:7077");
> try {
> new JavaSparkContext(conf);
> } catch (Throwable e) {
> System.out.println("Caught an exception : " + e.getMessage());
>
> }
> System.out.println("Waiting to complete...");
> while (true) {
> }
> }
> }
> Output log
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 16/03/04 18:01:15 INFO SparkContext: Running Spark version 1.6.0
> 16/03/04 18:01:17 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/03/04 18:01:17 WARN Utils: Your hostname, pesamara-mobl-vm1 resolves to a 
> loopback address: 127.0.0.1; using 10.30.9.107 instead (on interface eth0)
> 16/03/04 18:01:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 16/03/04 18:01:18 INFO SecurityManager: Changing view acls to: ps40233
> 16/03/04 18:01:18 INFO SecurityManager: Changing modify acls to: ps40233
> 16/03/04 18:01:18 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(ps40233); users 
> with modify permissions: Set(ps40233)
> 16/03/04 18:01:19 INFO Utils: Successfully started service 'sparkDriver' on 
> port 55309.
> 16/03/04 18:01:21 INFO Slf4jLogger: Slf4jLogger started
> 16/03/04 18:01:21 INFO Remoting: Starting remoting
> 16/03/04 18:01:22 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkDriverActorSystem@10.30.9.107:52128]
> 16/03/04 18:01:22 INFO Utils: Successfully started service 
> 'sparkDriverActorSystem' on port 52128.
> 16/03/04 18:01:22 INFO SparkEnv: Registering MapOutputTracker
> 16/03/04 18:01:22 INFO SparkEnv: Registering BlockManagerMaster
> 16/03/04 18:01:22 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-87c20178-357d-4252-a46a-62a755568a98
> 16/03/04 18:01:22 INFO MemoryStore: MemoryStore started with capacity 457.7 MB
> 16/03/04 18:01:22 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/03/04 18:01:23 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 16/03/04 18:01:23 INFO SparkUI: Started SparkUI at http://10.30.9.107:4040
> 16/03/04 18:01:24 INFO AppClient$ClientEndpoint: Connecting to master 
> spark://sunshinee:7077...
> 16/03/04 18:01:24 WARN AppClient$ClientEndpoint: Failed to connect to master 
> sunshinee:7077
> java.io.IOException: Failed to connect to sunshinee:7077
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
>  at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
> at 
> org.apache.spark.rpc.netty.NettyRpcEnv.

[jira] [Commented] (SPARK-13139) Create native DDL commands

2016-03-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184402#comment-15184402
 ] 

Apache Spark commented on SPARK-13139:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/11573

> Create native DDL commands
> --
>
> Key: SPARK-13139
> URL: https://issues.apache.org/jira/browse/SPARK-13139
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>
> We currently delegate most DDLs directly to Hive, through NativePlaceholder 
> in HiveQl.scala. In Spark 2.0, we want to provide native implementations for 
> DDLs for both SQLContext and HiveContext.
> The first step is to properly parse these DDLs, and then create logical 
> commands that encapsulate them. The actual implementation can still delegate 
> to HiveNativeCommand. As an example, we should define a command for 
> RenameTable with the proper fields, and just delegate the implementation to 
> HiveNativeCommand (we might need to track the original sql query in order to 
> run HiveNativeCommand, but we can remove the sql query in the future once we 
> do the next step).
> Once we flush out the internal persistent catalog API, we can then switch the 
> implementation of these newly added commands to use the catalog API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-07 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184387#comment-15184387
 ] 

Cody Koeninger commented on SPARK-12177:


I've been hacking on a simple lru cache for consumers and preferred
locations to take advantage of it. Will update here if it works out.

There are some things about the new consumer that make it awkward for this
purpose, mentioned on Kafka dev list but no real response



> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13593) improve the `createDataFrame` method to accept data type string and verify the data

2016-03-07 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-13593:

Summary: improve the `createDataFrame` method to accept data type string 
and verify the data  (was: improve the `toDF()` method to accept data type 
string and verify the data)

> improve the `createDataFrame` method to accept data type string and verify 
> the data
> ---
>
> Key: SPARK-13593
> URL: https://issues.apache.org/jira/browse/SPARK-13593
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13404) Create the variables for input when it's used

2016-03-07 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-13404.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11274
[https://github.com/apache/spark/pull/11274]

> Create the variables for input when it's used
> -
>
> Key: SPARK-13404
> URL: https://issues.apache.org/jira/browse/SPARK-13404
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> Right now, we create the variables in the first operator (usually 
> InputAdapter), they could be wasted if most of rows after filtered out 
> immediately.
> We should defer that until they are used by following operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-07 Thread Mansi Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184346#comment-15184346
 ] 

Mansi Shah commented on SPARK-12177:


So looks like we are dealing with two independent issues here - (a) version 
support. (b) 0.9.0 plugin design. Should we at least start hashing out the 
design for the new consumer and then we can see where it fits. Do the concerned 
folks still want to get on a call to figure out the design?

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-13737) Add getOrCreate method for HiveContext

2016-03-07 Thread Mao, Wei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao, Wei closed SPARK-13737.

Resolution: Won't Fix

> Add getOrCreate method for HiveContext
> --
>
> Key: SPARK-13737
> URL: https://issues.apache.org/jira/browse/SPARK-13737
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Mao, Wei
>
> There is a "getOrCreate" method in SQLContext, which is useful to recoverable 
> streaming application with SQL operation. 
> https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations
> But the corresponding method is missing in HiveContext. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13737) Add getOrCreate method for HiveContext

2016-03-07 Thread Mao, Wei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184263#comment-15184263
 ] 

Mao, Wei commented on SPARK-13737:
--

According to Reynold, databricks is actually going to deprecate HiveContext in 
Spark2.0 because it has been one of the most confusing contexts in Spark. And 
user can change constructor to just create a SQLContext/SparkSession
See more in https://issues.apache.org/jira/browse/SPARK-13485

> Add getOrCreate method for HiveContext
> --
>
> Key: SPARK-13737
> URL: https://issues.apache.org/jira/browse/SPARK-13737
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Mao, Wei
>
> There is a "getOrCreate" method in SQLContext, which is useful to recoverable 
> streaming application with SQL operation. 
> https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations
> But the corresponding method is missing in HiveContext. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13731) expression evaluation for NaN in select statement

2016-03-07 Thread Ian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184034#comment-15184034
 ] 

Ian edited comment on SPARK-13731 at 3/8/16 2:35 AM:
-

The test case we provided is using simple arithmetic expression like divisions 
for double, 
but in fact, many other math functions are having the same behaviors that 
returns null for NaN/Infinity. 
For instance, log() and corr(). 
{code}
SELECT log(a/b), corr(a, b) FROM testNan order by a, b
{code}



was (Author: ianlcsd):
The test case we provided is using simple arithmetic expression like divisions 
for double, 
but in fact, many other math functions are having the same behaviors that 
returns null for NaN/Infinity. 
For instance, log() and corr(). 
{code}
SELECT log(a/b), corn(a, b) FROM testNan order by a, b
{code}


> expression evaluation for NaN in select statement
> -
>
> Key: SPARK-13731
> URL: https://issues.apache.org/jira/browse/SPARK-13731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Ian
>
> We are expecting that arithmetic expression a/b should be:
> 1. returning NaN if a=0 and b=0
> 2. returning Infinity if a=1 and b=0
> Is the expectation reasonable? 
> The following is a simple test case snippet that reads from storage and 
> evaluates arithmetic expressions in select.
> It is assuming org.apache.spark.sql.hive.execution.SQLQuerySuite: 
> {code}
>   test("Expression should be evaluated to Nan/Infinity in Select") {
> withTable("testNan") {
>   withTempTable("src") {
> Seq((1d, 0d), (0d, 0d)).toDF().registerTempTable("src")
> sql("CREATE TABLE testNan(a double, b double) STORED AS PARQUET AS 
> SELECT * FROM src")
>   }
>   checkAnswer(sql(
> """
>   |SELECT a/b FROM testNan
> """.stripMargin),
> Seq(
>   Row(Double.PositiveInfinity),
>   Row(Double.NaN)
> )
>   )
> }
>   }
> == Physical Plan ==
> Project [(a#28 / b#29) AS _c0#30]
> +- Scan ParquetRelation: default.testnan[a#28,b#29] InputPaths: 
> file:/private/var/folders/dy/19y6pfm92pj9s40mbs8xd9hmgp/T/warehouse--5b617080-e909-4812-90e8-63d2dd0aef5a/testnan
> == Results ==
> !== Correct Answer - 2 ==   == Spark Answer - 2 ==
> ![Infinity] [null]
> ![NaN]  [null]
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13731) expression evaluation for NaN in select statement

2016-03-07 Thread Ian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184034#comment-15184034
 ] 

Ian edited comment on SPARK-13731 at 3/8/16 2:35 AM:
-

The test case we provided is using simple arithmetic expression like divisions 
for double, 
but in fact, many other math functions are having the same behaviors that 
returns null for NaN/Infinity. 
For instance, log() and corr(). 
{code}
SELECT log(a/b), corr(a, b) FROM testNan group by a, b order by a, b
{code}



was (Author: ianlcsd):
The test case we provided is using simple arithmetic expression like divisions 
for double, 
but in fact, many other math functions are having the same behaviors that 
returns null for NaN/Infinity. 
For instance, log() and corr(). 
{code}
SELECT log(a/b), corr(a, b) FROM testNan order by a, b
{code}


> expression evaluation for NaN in select statement
> -
>
> Key: SPARK-13731
> URL: https://issues.apache.org/jira/browse/SPARK-13731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Ian
>
> We are expecting that arithmetic expression a/b should be:
> 1. returning NaN if a=0 and b=0
> 2. returning Infinity if a=1 and b=0
> Is the expectation reasonable? 
> The following is a simple test case snippet that reads from storage and 
> evaluates arithmetic expressions in select.
> It is assuming org.apache.spark.sql.hive.execution.SQLQuerySuite: 
> {code}
>   test("Expression should be evaluated to Nan/Infinity in Select") {
> withTable("testNan") {
>   withTempTable("src") {
> Seq((1d, 0d), (0d, 0d)).toDF().registerTempTable("src")
> sql("CREATE TABLE testNan(a double, b double) STORED AS PARQUET AS 
> SELECT * FROM src")
>   }
>   checkAnswer(sql(
> """
>   |SELECT a/b FROM testNan
> """.stripMargin),
> Seq(
>   Row(Double.PositiveInfinity),
>   Row(Double.NaN)
> )
>   )
> }
>   }
> == Physical Plan ==
> Project [(a#28 / b#29) AS _c0#30]
> +- Scan ParquetRelation: default.testnan[a#28,b#29] InputPaths: 
> file:/private/var/folders/dy/19y6pfm92pj9s40mbs8xd9hmgp/T/warehouse--5b617080-e909-4812-90e8-63d2dd0aef5a/testnan
> == Results ==
> !== Correct Answer - 2 ==   == Spark Answer - 2 ==
> ![Infinity] [null]
> ![NaN]  [null]
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13739) Predicate Push Down For Window Operator

2016-03-07 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-13739:

Description: Push down the predicate through the Window operator.

> Predicate Push Down For Window Operator
> ---
>
> Key: SPARK-13739
> URL: https://issues.apache.org/jira/browse/SPARK-13739
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Push down the predicate through the Window operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13738) Clean up ResolveDataSource

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13738:


Assignee: Apache Spark  (was: Michael Armbrust)

> Clean up ResolveDataSource
> --
>
> Key: SPARK-13738
> URL: https://issues.apache.org/jira/browse/SPARK-13738
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13738) Clean up ResolveDataSource

2016-03-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184244#comment-15184244
 ] 

Apache Spark commented on SPARK-13738:
--

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/11572

> Clean up ResolveDataSource
> --
>
> Key: SPARK-13738
> URL: https://issues.apache.org/jira/browse/SPARK-13738
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13739) Predicate Push Down Through Window Operator

2016-03-07 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-13739:

Summary: Predicate Push Down Through Window Operator  (was: Predicate Push 
Down For Window Operator)

> Predicate Push Down Through Window Operator
> ---
>
> Key: SPARK-13739
> URL: https://issues.apache.org/jira/browse/SPARK-13739
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Push down the predicate through the Window operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13738) Clean up ResolveDataSource

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13738:


Assignee: Michael Armbrust  (was: Apache Spark)

> Clean up ResolveDataSource
> --
>
> Key: SPARK-13738
> URL: https://issues.apache.org/jira/browse/SPARK-13738
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13739) Predicate Push Down Through Window Operator

2016-03-07 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184245#comment-15184245
 ] 

Xiao Li commented on SPARK-13739:
-

Thanks! I am working on it. 

> Predicate Push Down Through Window Operator
> ---
>
> Key: SPARK-13739
> URL: https://issues.apache.org/jira/browse/SPARK-13739
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Push down the predicate through the Window operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13739) Predicate Push Down For Window Operator

2016-03-07 Thread Xiao Li (JIRA)
Xiao Li created SPARK-13739:
---

 Summary: Predicate Push Down For Window Operator
 Key: SPARK-13739
 URL: https://issues.apache.org/jira/browse/SPARK-13739
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13664) Simplify and Speedup HadoopFSRelation

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13664:


Assignee: Michael Armbrust  (was: Apache Spark)

> Simplify and Speedup HadoopFSRelation
> -
>
> Key: SPARK-13664
> URL: https://issues.apache.org/jira/browse/SPARK-13664
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
>
> A majority of Spark SQL queries likely run though {{HadoopFSRelation}}, 
> however there are currently several complexity and performance problems with 
> this code path:
>  - The class mixes the concerns of file management, schema reconciliation, 
> scan building, bucketing, partitioning, and writing data.
>  - For very large tables, we are broadcasting the entire list of files to 
> every executor. [SPARK-11441]
>  - For partitioned tables, we always do an extra projection.  This results 
> not only in a copy, but undoes much of the performance gains that we are 
> going to get from vectorized reads.
> This is an umbrella ticket to track a set of improvements to this codepath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13664) Simplify and Speedup HadoopFSRelation

2016-03-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184238#comment-15184238
 ] 

Apache Spark commented on SPARK-13664:
--

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/11572

> Simplify and Speedup HadoopFSRelation
> -
>
> Key: SPARK-13664
> URL: https://issues.apache.org/jira/browse/SPARK-13664
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
>
> A majority of Spark SQL queries likely run though {{HadoopFSRelation}}, 
> however there are currently several complexity and performance problems with 
> this code path:
>  - The class mixes the concerns of file management, schema reconciliation, 
> scan building, bucketing, partitioning, and writing data.
>  - For very large tables, we are broadcasting the entire list of files to 
> every executor. [SPARK-11441]
>  - For partitioned tables, we always do an extra projection.  This results 
> not only in a copy, but undoes much of the performance gains that we are 
> going to get from vectorized reads.
> This is an umbrella ticket to track a set of improvements to this codepath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13664) Simplify and Speedup HadoopFSRelation

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13664:


Assignee: Apache Spark  (was: Michael Armbrust)

> Simplify and Speedup HadoopFSRelation
> -
>
> Key: SPARK-13664
> URL: https://issues.apache.org/jira/browse/SPARK-13664
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Apache Spark
>Priority: Blocker
>
> A majority of Spark SQL queries likely run though {{HadoopFSRelation}}, 
> however there are currently several complexity and performance problems with 
> this code path:
>  - The class mixes the concerns of file management, schema reconciliation, 
> scan building, bucketing, partitioning, and writing data.
>  - For very large tables, we are broadcasting the entire list of files to 
> every executor. [SPARK-11441]
>  - For partitioned tables, we always do an extra projection.  This results 
> not only in a copy, but undoes much of the performance gains that we are 
> going to get from vectorized reads.
> This is an umbrella ticket to track a set of improvements to this codepath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13738) Clean up ResolveDataSource

2016-03-07 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13738:


 Summary: Clean up ResolveDataSource
 Key: SPARK-13738
 URL: https://issues.apache.org/jira/browse/SPARK-13738
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13737) Add getOrCreate method for HiveContext

2016-03-07 Thread Mao, Wei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184221#comment-15184221
 ] 

Mao, Wei commented on SPARK-13737:
--

HiveContext is heavily used by many users now, and enen many of them still 
coupled with old spark version. As this change would be trivial but not 
constructive, I think there is not conflict with the context combination work.

> Add getOrCreate method for HiveContext
> --
>
> Key: SPARK-13737
> URL: https://issues.apache.org/jira/browse/SPARK-13737
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Mao, Wei
>
> There is a "getOrCreate" method in SQLContext, which is useful to recoverable 
> streaming application with SQL operation. 
> https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations
> But the corresponding method is missing in HiveContext. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13737) Add getOrCreate method for HiveContext

2016-03-07 Thread Mao, Wei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184221#comment-15184221
 ] 

Mao, Wei edited comment on SPARK-13737 at 3/8/16 2:07 AM:
--

HiveContext is heavily used by many users now, and many of them still coupled 
with old spark version. As this change would be trivial but not constructive, I 
think there is not conflict with the context combination work.


was (Author: mwws):
HiveContext is heavily used by many users now, and enen many of them still 
coupled with old spark version. As this change would be trivial but not 
constructive, I think there is not conflict with the context combination work.

> Add getOrCreate method for HiveContext
> --
>
> Key: SPARK-13737
> URL: https://issues.apache.org/jira/browse/SPARK-13737
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Mao, Wei
>
> There is a "getOrCreate" method in SQLContext, which is useful to recoverable 
> streaming application with SQL operation. 
> https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations
> But the corresponding method is missing in HiveContext. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13721) Add support for LATERAL VIEW OUTER explode()

2016-03-07 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184213#comment-15184213
 ] 

Xiao Li commented on SPARK-13721:
-

That sounds reasonable. Maybe we can wait until DataFrame and DataSet APIs are 
combined. 

> Add support for LATERAL VIEW OUTER explode()
> 
>
> Key: SPARK-13721
> URL: https://issues.apache.org/jira/browse/SPARK-13721
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ian Hellstrom
>
> Hive supports the [LATERAL VIEW 
> OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews]
>  syntax to make sure that when an array is empty, the content from the outer 
> table is still returned. 
> Within Spark, this is currently only possible within the HiveContext and 
> executing HiveQL statements. It would be nice if the standard explode() 
> DataFrame method allows the same. A possible signature would be: 
> {code:scala}
> explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = 
> false)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13689) Move some methods in CatalystQl to a util object

2016-03-07 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13689.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Move some methods in CatalystQl to a util object
> 
>
> Key: SPARK-13689
> URL: https://issues.apache.org/jira/browse/SPARK-13689
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> When we add more DDL parsing logic in the future, SparkQl will become very 
> big. To keep it smaller, we'll introduce helper "parser objects", e.g. one to 
> parse alter table commands. However, these parser objects will need to access 
> some helper methods that exist in CatalystQl. The proposal is to move those 
> methods to an isolated ParserUtils object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13587) Support virtualenv in PySpark

2016-03-07 Thread Mike Sukmanowsky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184210#comment-15184210
 ] 

Mike Sukmanowsky commented on SPARK-13587:
--

[~juliet] I get the concerns relating to Spark supporting a complex virtualenv 
process. My main objection to only supporting something like --pyspark-python 
is the difficulty we currently face in locations like Amazon EMR, but really 
any Spark cluster where nodes are assumed to be added after an application is 
submitted deals with the issue.

We have a bootstrap script which provisions our EMR nodes with required Python 
dependencies. This approach works alright for a cluster which tends to run very 
few applications, but if we have multiple tenants, this approach quickly gets 
unwieldy. Ideally, Spark applications could be submitted from a master node 
with a user never having to worry about dependency management at the node 
bootstrapping level.

I was thinking that an interesting approach to this problem would be to provide 
some sort of a --bootstrap option to spark-submit which points to any 
executable which Spark will run and check for receipt of a 0 exit code before 
continuing to launch the application itself. This script could obviously 
execute any code such as creating a virtualenv or conda env and installing 
requirements. If a non-zero exit code were received, the Spark application 
would cease to continue.

The generalization gets the Spark community away from having to support 
conda/virtualenv eccentricities. Thoughts?

> Support virtualenv in PySpark
> -
>
> Key: SPARK-13587
> URL: https://issues.apache.org/jira/browse/SPARK-13587
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Jeff Zhang
>
> Currently, it's not easy for user to add third party python packages in 
> pyspark.
> * One way is to using --py-files (suitable for simple dependency, but not 
> suitable for complicated dependency, especially with transitive dependency)
> * Another way is install packages manually on each node (time wasting, and 
> not easy to switch to different environment)
> Python has now 2 different virtualenv implementation. One is native 
> virtualenv another is through conda. This jira is trying to migrate these 2 
> tools to distributed environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13737) Add getOrCreate method for HiveContext

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13737:


Assignee: (was: Apache Spark)

> Add getOrCreate method for HiveContext
> --
>
> Key: SPARK-13737
> URL: https://issues.apache.org/jira/browse/SPARK-13737
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Mao, Wei
>
> There is a "getOrCreate" method in SQLContext, which is useful to recoverable 
> streaming application with SQL operation. 
> https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations
> But the corresponding method is missing in HiveContext. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13737) Add getOrCreate method for HiveContext

2016-03-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184209#comment-15184209
 ] 

Apache Spark commented on SPARK-13737:
--

User 'mwws' has created a pull request for this issue:
https://github.com/apache/spark/pull/11571

> Add getOrCreate method for HiveContext
> --
>
> Key: SPARK-13737
> URL: https://issues.apache.org/jira/browse/SPARK-13737
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Mao, Wei
>
> There is a "getOrCreate" method in SQLContext, which is useful to recoverable 
> streaming application with SQL operation. 
> https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations
> But the corresponding method is missing in HiveContext. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13737) Add getOrCreate method for HiveContext

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13737:


Assignee: Apache Spark

> Add getOrCreate method for HiveContext
> --
>
> Key: SPARK-13737
> URL: https://issues.apache.org/jira/browse/SPARK-13737
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Mao, Wei
>Assignee: Apache Spark
>
> There is a "getOrCreate" method in SQLContext, which is useful to recoverable 
> streaming application with SQL operation. 
> https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations
> But the corresponding method is missing in HiveContext. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13737) Add getOrCreate method for HiveContext

2016-03-07 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184208#comment-15184208
 ] 

Xiao Li commented on SPARK-13737:
-

I think SQLContext and HiveContext is being combined. Not sure we should wait 
until the merge is done. 

> Add getOrCreate method for HiveContext
> --
>
> Key: SPARK-13737
> URL: https://issues.apache.org/jira/browse/SPARK-13737
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Mao, Wei
>
> There is a "getOrCreate" method in SQLContext, which is useful to recoverable 
> streaming application with SQL operation. 
> https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations
> But the corresponding method is missing in HiveContext. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-529) Have a single file that controls the environmental variables and spark config options

2016-03-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184206#comment-15184206
 ] 

Apache Spark commented on SPARK-529:


User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11570

> Have a single file that controls the environmental variables and spark config 
> options
> -
>
> Key: SPARK-529
> URL: https://issues.apache.org/jira/browse/SPARK-529
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Reynold Xin
>
> E.g. multiple places in the code base uses SPARK_MEM and has its own default 
> set to 512. We need a central place to enforce default values as well as 
> documenting the variables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13737) Add getOrCreate method for HiveContext

2016-03-07 Thread Mao, Wei (JIRA)
Mao, Wei created SPARK-13737:


 Summary: Add getOrCreate method for HiveContext
 Key: SPARK-13737
 URL: https://issues.apache.org/jira/browse/SPARK-13737
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Mao, Wei


There is a "getOrCreate" method in SQLContext, which is useful to recoverable 
streaming application with SQL operation. 
https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations

But the corresponding method is missing in HiveContext. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13723) YARN - Change behavior of --num-executors when spark.dynamicAllocation.enabled true

2016-03-07 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184177#comment-15184177
 ] 

Saisai Shao commented on SPARK-13723:
-

Yes, I agree with what [~tgraves] described above. Normally the use case is 
that user want to enable dynamic allocation intentionally but doesn't change 
the current executing command (lack of knowledge of it). So in this case 
dynamic allocation is failed to start, which is not expected.

> YARN - Change behavior of --num-executors when 
> spark.dynamicAllocation.enabled true
> ---
>
> Key: SPARK-13723
> URL: https://issues.apache.org/jira/browse/SPARK-13723
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Priority: Minor
>
> I think we should change the behavior when --num-executors is specified when 
> dynamic allocation is enabled. Currently if --num-executors is specified 
> dynamic allocation is disabled and it just uses a static number of executors.
> I would rather see the default behavior changed in the 2.x line. If dynamic 
> allocation config is on then num-executors goes to max and initial # of 
> executors. I think this would allow users to easily cap their usage and would 
> still allow it to free up executors. It would also allow users doing ML start 
> out with a # of executors and if they are actually caching the data the 
> executors wouldn't be freed up. So you would get very similar behavior to if 
> dynamic allocation was off.
> Part of the reason for this is when using a static number if generally wastes 
> resources, especially with people doing adhoc things with spark-shell. It 
> also has a big affect when people are doing MapReduce/ETL type work loads.   
> The problem is that people are used to specifying num-executors so if we turn 
> it on by default in a cluster config its just overridden.
> We should also update the spark-submit --help description for --num-executors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12555) Datasets: data is corrupted when input data is reordered

2016-03-07 Thread Luciano Resende (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luciano Resende updated SPARK-12555:

Issue Type: Sub-task  (was: Bug)
Parent: SPARK-13736

> Datasets: data is corrupted when input data is reordered
> 
>
> Key: SPARK-12555
> URL: https://issues.apache.org/jira/browse/SPARK-12555
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: ALL platforms on 1.6
>Reporter: Tim Preece
>
> Testcase
> ---
> {code}
> import org.apache.spark.sql.expressions.Aggregator
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.Dataset
> case class people(age: Int, name: String)
> object nameAgg extends Aggregator[people, String, String] {
>   def zero: String = ""
>   def reduce(b: String, a: people): String = a.name + b
>   def merge(b1: String, b2: String): String = b1 + b2
>   def finish(r: String): String = r
> }
> object DataSetAgg {
>   def main(args: Array[String]) {
> val conf = new SparkConf().setAppName("DataSetAgg")
> val spark = new SparkContext(conf)
> val sqlContext = new SQLContext(spark)
> import sqlContext.implicits._
> val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS 
> name, 1279869254 AS age").as[people]
> peopleds.groupBy(_.age).agg(nameAgg.toColumn).show()
>   }
> }
> {code}
> Result ( on a Little Endian Platform )
> 
> {noformat}
> +--+--+
> |_1|_2|
> +--+--+
> |1279869254|FAILTi|
> +--+--+
> {noformat}
> Explanation
> ---
> Internally the String variable in the unsafe row is not updated after an 
> unsafe row join operation.
> The displayed string is corrupted and shows part of the integer ( interpreted 
> as a string ) along with "Ti"
> The column names also look different on a Little Endian platform.
> Result ( on a Big Endian Platform )
> {noformat}
> +--+--+
> | value|nameAgg$(name,age)|
> +--+--+
> |1279869254|LIAFTi|
> +--+--+
> {noformat}
> The following Unit test also fails ( but only explicitly on a Big Endian 
> platorm )
> org.apache.spark.sql.DatasetAggregatorSuite
> - typed aggregation: class input with reordering *** FAILED ***
>   Results do not match for query:
>   == Parsed Logical Plan ==
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Analyzed Logical Plan ==
>   value: string, ClassInputAgg$(b,a): int
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Optimized Logical Plan ==
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Physical Plan ==
>   TungstenAggregate(key=[value#748], 
> functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], 
> output=[value#748,ClassInputAgg$(b,a)#762])
>   +- TungstenExchange hashpartitioning(value#748,5), None
>  +- TungstenAggregate(key=[value#748], 
> functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], 
> output=[value#748,value#758])
> +- !AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>+- Project [one AS b#650,1 AS a#651]
>   +- Scan OneRowRelation[]
>   == Results ==
>   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
>   ![one,1][one,9] (QueryTest.scala:127)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12319) ExchangeCoordinatorSuite fails on big-endian platforms

2016-03-07 Thread Luciano Resende (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luciano Resende updated SPARK-12319:

Issue Type: Sub-task  (was: Bug)
Parent: SPARK-13736

> ExchangeCoordinatorSuite fails on big-endian platforms
> --
>
> Key: SPARK-12319
> URL: https://issues.apache.org/jira/browse/SPARK-12319
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Problems apparent on BE, LE could be impacted too
>Reporter: Adam Roberts
>Priority: Critical
>
> JIRA to cover endian specific problems - since testing 1.6 I've noticed 
> problems with DataFrames on BE platforms, e.g. 
> https://issues.apache.org/jira/browse/SPARK-9858
> [~joshrosen] [~yhuai]
> Current progress: using com.google.common.io.LittleEndianDataInputStream and 
> com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer 
> fixes three test failures in ExchangeCoordinatorSuite but I'm concerned 
> around performance/wider functional implications
> "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input 
> with reordering" fails as we expect "one, 1" but instead get "one, 9" - we 
> believe the issue lies within BitSetMethods.java, specifically around: return 
> (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13736) Big-Endian plataform issues

2016-03-07 Thread Luciano Resende (JIRA)
Luciano Resende created SPARK-13736:
---

 Summary: Big-Endian plataform issues
 Key: SPARK-13736
 URL: https://issues.apache.org/jira/browse/SPARK-13736
 Project: Spark
  Issue Type: Epic
  Components: SQL
Affects Versions: 1.6.0
Reporter: Luciano Resende
Priority: Critical


We are starting to see few issues when building/testing on Big-Endian platform. 
This serves as an umbrella jira to group all platform specific issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13656) Delete spark.sql.parquet.cacheMetadata

2016-03-07 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184110#comment-15184110
 ] 

Takeshi Yamamuro commented on SPARK-13656:
--

Okay, I'll make a pr in a day.

> Delete spark.sql.parquet.cacheMetadata
> --
>
> Key: SPARK-13656
> URL: https://issues.apache.org/jira/browse/SPARK-13656
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yin Huai
>
> Looks like spark.sql.parquet.cacheMetadata is not used anymore. Let's delete 
> it to avoid any potential confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13734) SparkR histogram

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13734:


Assignee: (was: Apache Spark)

> SparkR histogram
> 
>
> Key: SPARK-13734
> URL: https://issues.apache.org/jira/browse/SPARK-13734
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13734) SparkR histogram

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13734:


Assignee: Apache Spark

> SparkR histogram
> 
>
> Key: SPARK-13734
> URL: https://issues.apache.org/jira/browse/SPARK-13734
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13734) SparkR histogram

2016-03-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184106#comment-15184106
 ] 

Apache Spark commented on SPARK-13734:
--

User 'olarayej' has created a pull request for this issue:
https://github.com/apache/spark/pull/11569

> SparkR histogram
> 
>
> Key: SPARK-13734
> URL: https://issues.apache.org/jira/browse/SPARK-13734
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13418) SQL generation for uncorrelated scalar subqueries

2016-03-07 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-13418.
--
   Resolution: Duplicate
Fix Version/s: 2.0.0

> SQL generation for uncorrelated scalar subqueries
> -
>
> Key: SPARK-13418
> URL: https://issues.apache.org/jira/browse/SPARK-13418
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
> Fix For: 2.0.0
>
>
> This is pretty difficult right now because SQLBuilder is in the hive package, 
> whereas the sql function for ScalarSubquery is defined in catalyst package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13727) SparkConf.contains does not consider deprecated keys

2016-03-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184095#comment-15184095
 ] 

Apache Spark commented on SPARK-13727:
--

User 'bomeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/11568

> SparkConf.contains does not consider deprecated keys
> 
>
> Key: SPARK-13727
> URL: https://issues.apache.org/jira/browse/SPARK-13727
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> This makes it kinda inconsistent with other SparkConf APIs. For example:
> {code}
> scala> import org.apache.spark.SparkConf
> import org.apache.spark.SparkConf
> scala> val conf = new SparkConf().set("spark.io.compression.lz4.block.size", 
> "12345")
> 16/03/07 10:55:17 WARN spark.SparkConf: The configuration key 
> 'spark.io.compression.lz4.block.size' has been deprecated as of Spark 1.4 and 
> and may be removed in the future. Please use the new key 
> 'spark.io.compression.lz4.blockSize' instead.
> conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@221e8982
> scala> conf.get("spark.io.compression.lz4.blockSize")
> res0: String = 12345
> scala> conf.contains("spark.io.compression.lz4.blockSize")
> res1: Boolean = false
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13727) SparkConf.contains does not consider deprecated keys

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13727:


Assignee: (was: Apache Spark)

> SparkConf.contains does not consider deprecated keys
> 
>
> Key: SPARK-13727
> URL: https://issues.apache.org/jira/browse/SPARK-13727
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> This makes it kinda inconsistent with other SparkConf APIs. For example:
> {code}
> scala> import org.apache.spark.SparkConf
> import org.apache.spark.SparkConf
> scala> val conf = new SparkConf().set("spark.io.compression.lz4.block.size", 
> "12345")
> 16/03/07 10:55:17 WARN spark.SparkConf: The configuration key 
> 'spark.io.compression.lz4.block.size' has been deprecated as of Spark 1.4 and 
> and may be removed in the future. Please use the new key 
> 'spark.io.compression.lz4.blockSize' instead.
> conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@221e8982
> scala> conf.get("spark.io.compression.lz4.blockSize")
> res0: String = 12345
> scala> conf.contains("spark.io.compression.lz4.blockSize")
> res1: Boolean = false
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13727) SparkConf.contains does not consider deprecated keys

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13727:


Assignee: Apache Spark

> SparkConf.contains does not consider deprecated keys
> 
>
> Key: SPARK-13727
> URL: https://issues.apache.org/jira/browse/SPARK-13727
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>Priority: Minor
>
> This makes it kinda inconsistent with other SparkConf APIs. For example:
> {code}
> scala> import org.apache.spark.SparkConf
> import org.apache.spark.SparkConf
> scala> val conf = new SparkConf().set("spark.io.compression.lz4.block.size", 
> "12345")
> 16/03/07 10:55:17 WARN spark.SparkConf: The configuration key 
> 'spark.io.compression.lz4.block.size' has been deprecated as of Spark 1.4 and 
> and may be removed in the future. Please use the new key 
> 'spark.io.compression.lz4.blockSize' instead.
> conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@221e8982
> scala> conf.get("spark.io.compression.lz4.blockSize")
> res0: String = 12345
> scala> conf.contains("spark.io.compression.lz4.blockSize")
> res1: Boolean = false
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13735) Log for parquet relation reading files is too verbose

2016-03-07 Thread Zhong Wang (JIRA)
Zhong Wang created SPARK-13735:
--

 Summary: Log for parquet relation reading files is too verbose
 Key: SPARK-13735
 URL: https://issues.apache.org/jira/browse/SPARK-13735
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.6.0
Reporter: Zhong Wang
Priority: Trivial


The INFO level logging contains all files read by Parquet Relation, which is 
way too verbose if the input contains lots of files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6666) org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names

2016-03-07 Thread Luciano Resende (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luciano Resende closed SPARK-.
--
Resolution: Cannot Reproduce

I have tried the scenarios above in Spark trunk using both Postgres and DB2, 
see:
https://github.com/lresende/spark-sandbox/blob/master/src/main/scala/com/luck/sql/JDBCApplication.scala

And the described issues seems not reproducible anymore, see all results below

root
 |-- Symbol: string (nullable = true)
 |-- Name: string (nullable = true)
 |-- Sector: string (nullable = true)
 |-- Price: double (nullable = true)
 |-- Dividend Yield: double (nullable = true)
 |-- Price/Earnings: double (nullable = true)
 |-- Earnings/Share: double (nullable = true)
 |-- Book Value: double (nullable = true)
 |-- 52 week low: double (nullable = true)
 |-- 52 week high: double (nullable = true)
 |-- Market Cap: double (nullable = true)
 |-- EBITDA: double (nullable = true)
 |-- Price/Sales: double (nullable = true)
 |-- Price/Book: double (nullable = true)
 |-- SEC Filings: string (nullable = true)

+--+--+--+-+--+--+--+--+---++--+--+---+--+---+
|Symbol|  Name|Sector|Price|Dividend Yield|Price/Earnings|Earnings/Share|Book 
Value|52 week low|52 week high|Market Cap|EBITDA|Price/Sales|Price/Book|SEC 
Filings|
+--+--+--+-+--+--+--+--+---++--+--+---+--+---+
|S1|Name 1| Sec 1| 10.0|  10.0|  10.0|  10.0|  
10.0|   10.0|10.0|  10.0|  10.0|   10.0|  10.0|
100|
|s2|Name 2| Sec 2| 20.0|  20.0|  20.0|  20.0|  
20.0|   20.0|20.0|  20.0|  20.0|   20.0|  20.0|
200|
+--+--+--+-+--+--+--+--+---++--+--+---+--+---+

+--+
|AvgCPI|
+--+
|  15.0|
+--+

+--+--+--+-+--+--+--+--+---++--+--+---+--+---+
|Symbol|  Name|Sector|Price|Dividend Yield|Price/Earnings|Earnings/Share|Book 
Value|52 week low|52 week high|Market Cap|EBITDA|Price/Sales|Price/Book|SEC 
Filings|
+--+--+--+-+--+--+--+--+---++--+--+---+--+---+
|S1|Name 1| Sec 1| 10.0|  10.0|  10.0|  10.0|  
10.0|   10.0|10.0|  10.0|  10.0|   10.0|  10.0|
100|
|s2|Name 2| Sec 2| 20.0|  20.0|  20.0|  20.0|  
20.0|   20.0|20.0|  20.0|  20.0|   20.0|  20.0|
200|
+--+--+--+-+--+--+--+--+---++--+--+---+--+---+



> org.apache.spark.sql.jdbc.JDBCRDD  does not escape/quote column names
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
> Environment:  
>Reporter: John Ferguson
>Priority: Critical
>
> Is there a way to have JDBC DataFrames use quoted/escaped column names?  
> Right now, it looks like it "sees" the names correctly in the schema created 
> but does not escape them in the SQL it creates when they are not compliant:
> org.apache.spark.sql.jdbc.JDBCRDD
> 
> private val columnList: String = {
> val sb = new StringBuilder()
> columns.foreach(x => sb.append(",").append(x))
> if (sb.length == 0) "1" else sb.substring(1)
> }
> If you see value in this, I would take a shot at adding the quoting 
> (escaping) of column names here.  If you don't do it, some drivers... like 
> postgresql's will simply drop case all names when parsing the query.  As you 
> can see in the TL;DR below that means they won't match the schema I am given.
> TL;DR:
>  
> I am able to connect to a Postgres database in the shell (with driver 
> referenced):
>val jdbcDf = 
> sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser", "sp500")
> In fact when I run:
>jdbcDf.registerTempTable("sp500")
>val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as AvgCPI 
> FROM sp500")
> and
>val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share")))
> The values come back as expected.  However, if I try:
>jdbcDf.show
> Or if I try
>
>val all = sqlContext.sql("SELECT * FROM sp500")
>all.show
> I get errors about column names not be

[jira] [Commented] (SPARK-13726) Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable

2016-03-07 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184077#comment-15184077
 ] 

Yin Huai commented on SPARK-13726:
--

I think that after setting that conf to true, the behavior should be the same 
as Spark 1.5. You can still run multiple JDBC connections. For running multiple 
queries concurrently, as long as you set spark.sql.thriftserver.scheduler.pool 
for different JDBC connections like before, it should work (see 
http://spark.apache.org/docs/latest/sql-programming-guide.html#scheduling).

For long term, I think we will keep the current behavior. When you need to 
share temp tables across sessions, we have to set 
spark.sql.hive.thriftServer.singleSession to true. 

> Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable
> 
>
> Key: SPARK-13726
> URL: https://issues.apache.org/jira/browse/SPARK-13726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Nguyen
>Priority: Blocker
>
> In Spark 1.5.2, DataFrame.registerTempTable works and  
> hiveContext.table(registerTableName) and HiveThriftServer2 see those tables.
> In Spark 1.6.0, hiveContext.table(registerTableName) and HiveThriftServer2 do 
> not see those tables, even though DataFrame.registerTempTable does not return 
> an error.
> Since this feature used to work in Spark 1.5.2, there is existing code that 
> breaks after upgrading to Spark 1.6.0. so this issue is a blocker and urgent. 
> Therefore, please have it fixed asap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work

2016-03-07 Thread nicerobot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184075#comment-15184075
 ] 

nicerobot commented on SPARK-10548:
---

Thanks [~zsxwing]. What's the recommended way to accomplish that?

> Concurrent execution in SQL does not work
> -
>
> Key: SPARK-10548
> URL: https://issues.apache.org/jira/browse/SPARK-10548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> From the mailing list:
> {code}
> future { df1.count() } 
> future { df2.count() } 
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
> at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> === edit ===
> Simple reproduction:
> {code}
> (1 to 100).par.foreach { _ =>
>   sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13734) SparkR histogram

2016-03-07 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-13734:

Summary: SparkR histogram  (was: Histogram)

> SparkR histogram
> 
>
> Key: SPARK-13734
> URL: https://issues.apache.org/jira/browse/SPARK-13734
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13726) Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable

2016-03-07 Thread Michael Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184062#comment-15184062
 ] 

Michael Nguyen commented on SPARK-13726:


That works. Thanks, Yin, for the work-around solution. Does setting 
spark.sql.hive.thriftServer.singleSession to true

1. Limit Hive to support only one JDBC connection or one query at a time ?  Or
2. Performance for running multiple queries from multiple JDBC connections at 
the same time ?

If so, could you provide a long-term solution that do not have these issues ?



> Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable
> 
>
> Key: SPARK-13726
> URL: https://issues.apache.org/jira/browse/SPARK-13726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Nguyen
>Priority: Blocker
>
> In Spark 1.5.2, DataFrame.registerTempTable works and  
> hiveContext.table(registerTableName) and HiveThriftServer2 see those tables.
> In Spark 1.6.0, hiveContext.table(registerTableName) and HiveThriftServer2 do 
> not see those tables, even though DataFrame.registerTempTable does not return 
> an error.
> Since this feature used to work in Spark 1.5.2, there is existing code that 
> breaks after upgrading to Spark 1.6.0. so this issue is a blocker and urgent. 
> Therefore, please have it fixed asap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13734) Histogram

2016-03-07 Thread Oscar D. Lara Yejas (JIRA)
Oscar D. Lara Yejas created SPARK-13734:
---

 Summary: Histogram
 Key: SPARK-13734
 URL: https://issues.apache.org/jira/browse/SPARK-13734
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Oscar D. Lara Yejas






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13731) expression evaluation for NaN in select statement

2016-03-07 Thread Ian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183869#comment-15183869
 ] 

Ian edited comment on SPARK-13731 at 3/8/16 12:04 AM:
--

The expression in select essentially defines a transformation from data 
residing on storage or even another RDD. We are seeing that the transformation 
result is now null for both NaN and Infinity.

We saw SPARK-9076, which seemed addressing only how NaN value in RDD can be 
handled(equality, ordering, ...), 
but our concerned case is more on the another aspect and more fundamental about 
the expressions that might produce NaN. 

 
 



was (Author: ianlcsd):
The expression in select essentially defines a transformation from data 
residing on storage or even another RDD. We are seeing that the transformation 
result is now null for both NaN and Infinity.

We saw SPARK-9076, which seemed addressing only how NaN value in RDD can be 
handled(comparing,ordering, ...), 
but our concerned case is more on the another aspect and more fundamental about 
the expressions that might produce NaN. 

 
 


> expression evaluation for NaN in select statement
> -
>
> Key: SPARK-13731
> URL: https://issues.apache.org/jira/browse/SPARK-13731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Ian
>
> We are expecting that arithmetic expression a/b should be:
> 1. returning NaN if a=0 and b=0
> 2. returning Infinity if a=1 and b=0
> Is the expectation reasonable? 
> The following is a simple test case snippet that reads from storage and 
> evaluates arithmetic expressions in select.
> It is assuming org.apache.spark.sql.hive.execution.SQLQuerySuite: 
> {code}
>   test("Expression should be evaluated to Nan/Infinity in Select") {
> withTable("testNan") {
>   withTempTable("src") {
> Seq((1d, 0d), (0d, 0d)).toDF().registerTempTable("src")
> sql("CREATE TABLE testNan(a double, b double) STORED AS PARQUET AS 
> SELECT * FROM src")
>   }
>   checkAnswer(sql(
> """
>   |SELECT a/b FROM testNan
> """.stripMargin),
> Seq(
>   Row(Double.PositiveInfinity),
>   Row(Double.NaN)
> )
>   )
> }
>   }
> == Physical Plan ==
> Project [(a#28 / b#29) AS _c0#30]
> +- Scan ParquetRelation: default.testnan[a#28,b#29] InputPaths: 
> file:/private/var/folders/dy/19y6pfm92pj9s40mbs8xd9hmgp/T/warehouse--5b617080-e909-4812-90e8-63d2dd0aef5a/testnan
> == Results ==
> !== Correct Answer - 2 ==   == Spark Answer - 2 ==
> ![Infinity] [null]
> ![NaN]  [null]
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13734) Histogram

2016-03-07 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184057#comment-15184057
 ] 

Oscar D. Lara Yejas commented on SPARK-13734:
-

I'm working on this one.

> Histogram
> -
>
> Key: SPARK-13734
> URL: https://issues.apache.org/jira/browse/SPARK-13734
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13731) expression evaluation for NaN in select statement

2016-03-07 Thread Ian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian updated SPARK-13731:

Description: 
We are expecting that arithmetic expression a/b should be:
1. returning NaN if a=0 and b=0
2. returning Infinity if a=1 and b=0

Is the expectation reasonable? 
The following is a simple test case snippet that reads from storage and 
evaluates arithmetic expressions in select.
It is assuming org.apache.spark.sql.hive.execution.SQLQuerySuite: 
{code}
  test("Expression should be evaluated to Nan/Infinity in Select") {
withTable("testNan") {
  withTempTable("src") {
Seq((1d, 0d), (0d, 0d)).toDF().registerTempTable("src")
sql("CREATE TABLE testNan(a double, b double) STORED AS PARQUET AS 
SELECT * FROM src")
  }

  checkAnswer(sql(
"""
  |SELECT a/b FROM testNan
""".stripMargin),
Seq(
  Row(Double.PositiveInfinity),
  Row(Double.NaN)
)
  )
}
  }


== Physical Plan ==
Project [(a#28 / b#29) AS _c0#30]
+- Scan ParquetRelation: default.testnan[a#28,b#29] InputPaths: 
file:/private/var/folders/dy/19y6pfm92pj9s40mbs8xd9hmgp/T/warehouse--5b617080-e909-4812-90e8-63d2dd0aef5a/testnan
== Results ==
!== Correct Answer - 2 ==   == Spark Answer - 2 ==
![Infinity] [null]
![NaN]  [null]
  
{code}

  was:
We are expecting that arithmetic expression a/b should be:
1. returning NaN if a=0 and b=0
2. returning Infinity if a=1 and b=0

Is the expectation reasonable? 
The following is a simple test case snippet that read from storage and evaluate 
arithmetic in select.
It si assuming org.apache.spark.sql.hive.execution.SQLQuerySuite: 
{code}
  test("Expression should be evaluated to Nan/Infinity in Select") {
withTable("testNan") {
  withTempTable("src") {
Seq((1d, 0d), (0d, 0d)).toDF().registerTempTable("src")
sql("CREATE TABLE testNan(a double, b double) STORED AS PARQUET AS 
SELECT * FROM src")
  }

  checkAnswer(sql(
"""
  |SELECT a/b FROM testNan
""".stripMargin),
Seq(
  Row(Double.PositiveInfinity),
  Row(Double.NaN)
)
  )
}
  }


== Physical Plan ==
Project [(a#28 / b#29) AS _c0#30]
+- Scan ParquetRelation: default.testnan[a#28,b#29] InputPaths: 
file:/private/var/folders/dy/19y6pfm92pj9s40mbs8xd9hmgp/T/warehouse--5b617080-e909-4812-90e8-63d2dd0aef5a/testnan
== Results ==
!== Correct Answer - 2 ==   == Spark Answer - 2 ==
![Infinity] [null]
![NaN]  [null]
  
{code}


> expression evaluation for NaN in select statement
> -
>
> Key: SPARK-13731
> URL: https://issues.apache.org/jira/browse/SPARK-13731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Ian
>
> We are expecting that arithmetic expression a/b should be:
> 1. returning NaN if a=0 and b=0
> 2. returning Infinity if a=1 and b=0
> Is the expectation reasonable? 
> The following is a simple test case snippet that reads from storage and 
> evaluates arithmetic expressions in select.
> It is assuming org.apache.spark.sql.hive.execution.SQLQuerySuite: 
> {code}
>   test("Expression should be evaluated to Nan/Infinity in Select") {
> withTable("testNan") {
>   withTempTable("src") {
> Seq((1d, 0d), (0d, 0d)).toDF().registerTempTable("src")
> sql("CREATE TABLE testNan(a double, b double) STORED AS PARQUET AS 
> SELECT * FROM src")
>   }
>   checkAnswer(sql(
> """
>   |SELECT a/b FROM testNan
> """.stripMargin),
> Seq(
>   Row(Double.PositiveInfinity),
>   Row(Double.NaN)
> )
>   )
> }
>   }
> == Physical Plan ==
> Project [(a#28 / b#29) AS _c0#30]
> +- Scan ParquetRelation: default.testnan[a#28,b#29] InputPaths: 
> file:/private/var/folders/dy/19y6pfm92pj9s40mbs8xd9hmgp/T/warehouse--5b617080-e909-4812-90e8-63d2dd0aef5a/testnan
> == Results ==
> !== Correct Answer - 2 ==   == Spark Answer - 2 ==
> ![Infinity] [null]
> ![NaN]  [null]
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13648) org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on IBM JDK

2016-03-07 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-13648.
--
   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 11495
[https://github.com/apache/spark/pull/11495]

> org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on 
> IBM JDK
> 
>
> Key: SPARK-13648
> URL: https://issues.apache.org/jira/browse/SPARK-13648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
> Environment: Fails on vendor specific JVMs ( e.g IBM JVM )
>Reporter: Tim Preece
>Priority: Minor
> Fix For: 2.0.0, 1.6.1
>
>
> When running the standard Spark unit tests on the IBM Java SDK the hive 
> VersionsSuite fail with the following error.
> java.lang.NoClassDefFoundError:  org.apache.hadoop.hive.cli.CliSessionState 
> when creating Hive client using classpath: ..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13648) org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on IBM JDK

2016-03-07 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-13648:
-
Fix Version/s: (was: 1.6.1)
   1.6.2

> org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on 
> IBM JDK
> 
>
> Key: SPARK-13648
> URL: https://issues.apache.org/jira/browse/SPARK-13648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
> Environment: Fails on vendor specific JVMs ( e.g IBM JVM )
>Reporter: Tim Preece
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> When running the standard Spark unit tests on the IBM Java SDK the hive 
> VersionsSuite fail with the following error.
> java.lang.NoClassDefFoundError:  org.apache.hadoop.hive.cli.CliSessionState 
> when creating Hive client using classpath: ..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12458) Add ExpressionDescription to datetime functions

2016-03-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184038#comment-15184038
 ] 

Apache Spark commented on SPARK-12458:
--

User 'dilipbiswal' has created a pull request for this issue:
https://github.com/apache/spark/pull/10428

> Add ExpressionDescription to datetime functions
> ---
>
> Key: SPARK-12458
> URL: https://issues.apache.org/jira/browse/SPARK-12458
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12458) Add ExpressionDescription to datetime functions

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12458:


Assignee: Apache Spark

> Add ExpressionDescription to datetime functions
> ---
>
> Key: SPARK-12458
> URL: https://issues.apache.org/jira/browse/SPARK-12458
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12458) Add ExpressionDescription to datetime functions

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12458:


Assignee: (was: Apache Spark)

> Add ExpressionDescription to datetime functions
> ---
>
> Key: SPARK-12458
> URL: https://issues.apache.org/jira/browse/SPARK-12458
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13731) expression evaluation for NaN in select statement

2016-03-07 Thread Ian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184034#comment-15184034
 ] 

Ian edited comment on SPARK-13731 at 3/7/16 11:41 PM:
--

The test case we provided is using simple arithmetic expression like divisions 
for double, 
but in fact, many other math functions are having the same behaviors that 
returns null for NaN/Infinity. 
For instance, log() and corr(). 
{code}
SELECT log(a/b), corn(a, b) FROM testNan order by a, b
{code}



was (Author: ianlcsd):
The test case we provided is using simple arithmetic expression like divisions 
for double, 
but in fact, many math other functions are having the same behaviors that 
returns null for NaN/Infinity. 
For instance, log() and corr(). 
{code}
SELECT log(a/b), corn(a, b) FROM testNan order by a, b
{code}


> expression evaluation for NaN in select statement
> -
>
> Key: SPARK-13731
> URL: https://issues.apache.org/jira/browse/SPARK-13731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Ian
>
> We are expecting that arithmetic expression a/b should be:
> 1. returning NaN if a=0 and b=0
> 2. returning Infinity if a=1 and b=0
> Is the expectation reasonable? 
> The following is a simple test case snippet that read from storage and 
> evaluate arithmetic in select.
> It si assuming org.apache.spark.sql.hive.execution.SQLQuerySuite: 
> {code}
>   test("Expression should be evaluated to Nan/Infinity in Select") {
> withTable("testNan") {
>   withTempTable("src") {
> Seq((1d, 0d), (0d, 0d)).toDF().registerTempTable("src")
> sql("CREATE TABLE testNan(a double, b double) STORED AS PARQUET AS 
> SELECT * FROM src")
>   }
>   checkAnswer(sql(
> """
>   |SELECT a/b FROM testNan
> """.stripMargin),
> Seq(
>   Row(Double.PositiveInfinity),
>   Row(Double.NaN)
> )
>   )
> }
>   }
> == Physical Plan ==
> Project [(a#28 / b#29) AS _c0#30]
> +- Scan ParquetRelation: default.testnan[a#28,b#29] InputPaths: 
> file:/private/var/folders/dy/19y6pfm92pj9s40mbs8xd9hmgp/T/warehouse--5b617080-e909-4812-90e8-63d2dd0aef5a/testnan
> == Results ==
> !== Correct Answer - 2 ==   == Spark Answer - 2 ==
> ![Infinity] [null]
> ![NaN]  [null]
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13731) expression evaluation for NaN in select statement

2016-03-07 Thread Ian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184034#comment-15184034
 ] 

Ian commented on SPARK-13731:
-

The test case we provided is using simple arithmetic expression like divisions 
for double, 
but in fact, many math other functions are having the same behaviors that 
returns null for NaN/Infinity. 
For instance, log() and corr(). 
{code}
SELECT log(a/b), corn(a, b) FROM testNan order by a, b
{code}


> expression evaluation for NaN in select statement
> -
>
> Key: SPARK-13731
> URL: https://issues.apache.org/jira/browse/SPARK-13731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Ian
>
> We are expecting that arithmetic expression a/b should be:
> 1. returning NaN if a=0 and b=0
> 2. returning Infinity if a=1 and b=0
> Is the expectation reasonable? 
> The following is a simple test case snippet that read from storage and 
> evaluate arithmetic in select.
> It si assuming org.apache.spark.sql.hive.execution.SQLQuerySuite: 
> {code}
>   test("Expression should be evaluated to Nan/Infinity in Select") {
> withTable("testNan") {
>   withTempTable("src") {
> Seq((1d, 0d), (0d, 0d)).toDF().registerTempTable("src")
> sql("CREATE TABLE testNan(a double, b double) STORED AS PARQUET AS 
> SELECT * FROM src")
>   }
>   checkAnswer(sql(
> """
>   |SELECT a/b FROM testNan
> """.stripMargin),
> Seq(
>   Row(Double.PositiveInfinity),
>   Row(Double.NaN)
> )
>   )
> }
>   }
> == Physical Plan ==
> Project [(a#28 / b#29) AS _c0#30]
> +- Scan ParquetRelation: default.testnan[a#28,b#29] InputPaths: 
> file:/private/var/folders/dy/19y6pfm92pj9s40mbs8xd9hmgp/T/warehouse--5b617080-e909-4812-90e8-63d2dd0aef5a/testnan
> == Results ==
> !== Correct Answer - 2 ==   == Spark Answer - 2 ==
> ![Infinity] [null]
> ![NaN]  [null]
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13711) Apache Spark driver stopping JVM when master not available

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13711:


Assignee: (was: Apache Spark)

> Apache Spark driver stopping JVM when master not available 
> ---
>
> Key: SPARK-13711
> URL: https://issues.apache.org/jira/browse/SPARK-13711
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1, 1.6.0
>Reporter: Era
>
> In my application Java spark context is created with an unavailable master 
> URL (you may assume master is down for a maintenance). When creating Java 
> spark context it leads to stopping JVM that runs spark driver with JVM exit 
> code 50.
> When I checked the logs I found SparkUncaughtExceptionHandler calling the 
> System.exit. My program should run forever. 
> package test.mains;
> import org.apache.spark.SparkConf;
> import org.apache.spark.api.java.JavaSparkContext;
> public class CheckJavaSparkContext {
> public static void main(String[] args) {
> SparkConf conf = new SparkConf();
> conf.setAppName("test");
> conf.setMaster("spark://sunshinee:7077");
> try {
> new JavaSparkContext(conf);
> } catch (Throwable e) {
> System.out.println("Caught an exception : " + e.getMessage());
>
> }
> System.out.println("Waiting to complete...");
> while (true) {
> }
> }
> }
> Output log
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 16/03/04 18:01:15 INFO SparkContext: Running Spark version 1.6.0
> 16/03/04 18:01:17 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/03/04 18:01:17 WARN Utils: Your hostname, pesamara-mobl-vm1 resolves to a 
> loopback address: 127.0.0.1; using 10.30.9.107 instead (on interface eth0)
> 16/03/04 18:01:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 16/03/04 18:01:18 INFO SecurityManager: Changing view acls to: ps40233
> 16/03/04 18:01:18 INFO SecurityManager: Changing modify acls to: ps40233
> 16/03/04 18:01:18 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(ps40233); users 
> with modify permissions: Set(ps40233)
> 16/03/04 18:01:19 INFO Utils: Successfully started service 'sparkDriver' on 
> port 55309.
> 16/03/04 18:01:21 INFO Slf4jLogger: Slf4jLogger started
> 16/03/04 18:01:21 INFO Remoting: Starting remoting
> 16/03/04 18:01:22 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkDriverActorSystem@10.30.9.107:52128]
> 16/03/04 18:01:22 INFO Utils: Successfully started service 
> 'sparkDriverActorSystem' on port 52128.
> 16/03/04 18:01:22 INFO SparkEnv: Registering MapOutputTracker
> 16/03/04 18:01:22 INFO SparkEnv: Registering BlockManagerMaster
> 16/03/04 18:01:22 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-87c20178-357d-4252-a46a-62a755568a98
> 16/03/04 18:01:22 INFO MemoryStore: MemoryStore started with capacity 457.7 MB
> 16/03/04 18:01:22 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/03/04 18:01:23 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 16/03/04 18:01:23 INFO SparkUI: Started SparkUI at http://10.30.9.107:4040
> 16/03/04 18:01:24 INFO AppClient$ClientEndpoint: Connecting to master 
> spark://sunshinee:7077...
> 16/03/04 18:01:24 WARN AppClient$ClientEndpoint: Failed to connect to master 
> sunshinee:7077
> java.io.IOException: Failed to connect to sunshinee:7077
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
>  at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
> at 
> org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
> at org.apache.

[jira] [Assigned] (SPARK-13711) Apache Spark driver stopping JVM when master not available

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13711:


Assignee: Apache Spark

> Apache Spark driver stopping JVM when master not available 
> ---
>
> Key: SPARK-13711
> URL: https://issues.apache.org/jira/browse/SPARK-13711
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1, 1.6.0
>Reporter: Era
>Assignee: Apache Spark
>
> In my application Java spark context is created with an unavailable master 
> URL (you may assume master is down for a maintenance). When creating Java 
> spark context it leads to stopping JVM that runs spark driver with JVM exit 
> code 50.
> When I checked the logs I found SparkUncaughtExceptionHandler calling the 
> System.exit. My program should run forever. 
> package test.mains;
> import org.apache.spark.SparkConf;
> import org.apache.spark.api.java.JavaSparkContext;
> public class CheckJavaSparkContext {
> public static void main(String[] args) {
> SparkConf conf = new SparkConf();
> conf.setAppName("test");
> conf.setMaster("spark://sunshinee:7077");
> try {
> new JavaSparkContext(conf);
> } catch (Throwable e) {
> System.out.println("Caught an exception : " + e.getMessage());
>
> }
> System.out.println("Waiting to complete...");
> while (true) {
> }
> }
> }
> Output log
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 16/03/04 18:01:15 INFO SparkContext: Running Spark version 1.6.0
> 16/03/04 18:01:17 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/03/04 18:01:17 WARN Utils: Your hostname, pesamara-mobl-vm1 resolves to a 
> loopback address: 127.0.0.1; using 10.30.9.107 instead (on interface eth0)
> 16/03/04 18:01:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 16/03/04 18:01:18 INFO SecurityManager: Changing view acls to: ps40233
> 16/03/04 18:01:18 INFO SecurityManager: Changing modify acls to: ps40233
> 16/03/04 18:01:18 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(ps40233); users 
> with modify permissions: Set(ps40233)
> 16/03/04 18:01:19 INFO Utils: Successfully started service 'sparkDriver' on 
> port 55309.
> 16/03/04 18:01:21 INFO Slf4jLogger: Slf4jLogger started
> 16/03/04 18:01:21 INFO Remoting: Starting remoting
> 16/03/04 18:01:22 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkDriverActorSystem@10.30.9.107:52128]
> 16/03/04 18:01:22 INFO Utils: Successfully started service 
> 'sparkDriverActorSystem' on port 52128.
> 16/03/04 18:01:22 INFO SparkEnv: Registering MapOutputTracker
> 16/03/04 18:01:22 INFO SparkEnv: Registering BlockManagerMaster
> 16/03/04 18:01:22 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-87c20178-357d-4252-a46a-62a755568a98
> 16/03/04 18:01:22 INFO MemoryStore: MemoryStore started with capacity 457.7 MB
> 16/03/04 18:01:22 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/03/04 18:01:23 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 16/03/04 18:01:23 INFO SparkUI: Started SparkUI at http://10.30.9.107:4040
> 16/03/04 18:01:24 INFO AppClient$ClientEndpoint: Connecting to master 
> spark://sunshinee:7077...
> 16/03/04 18:01:24 WARN AppClient$ClientEndpoint: Failed to connect to master 
> sunshinee:7077
> java.io.IOException: Failed to connect to sunshinee:7077
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
>  at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
> at 
> org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)

[jira] [Commented] (SPARK-13711) Apache Spark driver stopping JVM when master not available

2016-03-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184016#comment-15184016
 ] 

Apache Spark commented on SPARK-13711:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/11566

> Apache Spark driver stopping JVM when master not available 
> ---
>
> Key: SPARK-13711
> URL: https://issues.apache.org/jira/browse/SPARK-13711
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1, 1.6.0
>Reporter: Era
>
> In my application Java spark context is created with an unavailable master 
> URL (you may assume master is down for a maintenance). When creating Java 
> spark context it leads to stopping JVM that runs spark driver with JVM exit 
> code 50.
> When I checked the logs I found SparkUncaughtExceptionHandler calling the 
> System.exit. My program should run forever. 
> package test.mains;
> import org.apache.spark.SparkConf;
> import org.apache.spark.api.java.JavaSparkContext;
> public class CheckJavaSparkContext {
> public static void main(String[] args) {
> SparkConf conf = new SparkConf();
> conf.setAppName("test");
> conf.setMaster("spark://sunshinee:7077");
> try {
> new JavaSparkContext(conf);
> } catch (Throwable e) {
> System.out.println("Caught an exception : " + e.getMessage());
>
> }
> System.out.println("Waiting to complete...");
> while (true) {
> }
> }
> }
> Output log
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 16/03/04 18:01:15 INFO SparkContext: Running Spark version 1.6.0
> 16/03/04 18:01:17 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/03/04 18:01:17 WARN Utils: Your hostname, pesamara-mobl-vm1 resolves to a 
> loopback address: 127.0.0.1; using 10.30.9.107 instead (on interface eth0)
> 16/03/04 18:01:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 16/03/04 18:01:18 INFO SecurityManager: Changing view acls to: ps40233
> 16/03/04 18:01:18 INFO SecurityManager: Changing modify acls to: ps40233
> 16/03/04 18:01:18 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(ps40233); users 
> with modify permissions: Set(ps40233)
> 16/03/04 18:01:19 INFO Utils: Successfully started service 'sparkDriver' on 
> port 55309.
> 16/03/04 18:01:21 INFO Slf4jLogger: Slf4jLogger started
> 16/03/04 18:01:21 INFO Remoting: Starting remoting
> 16/03/04 18:01:22 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkDriverActorSystem@10.30.9.107:52128]
> 16/03/04 18:01:22 INFO Utils: Successfully started service 
> 'sparkDriverActorSystem' on port 52128.
> 16/03/04 18:01:22 INFO SparkEnv: Registering MapOutputTracker
> 16/03/04 18:01:22 INFO SparkEnv: Registering BlockManagerMaster
> 16/03/04 18:01:22 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-87c20178-357d-4252-a46a-62a755568a98
> 16/03/04 18:01:22 INFO MemoryStore: MemoryStore started with capacity 457.7 MB
> 16/03/04 18:01:22 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/03/04 18:01:23 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 16/03/04 18:01:23 INFO SparkUI: Started SparkUI at http://10.30.9.107:4040
> 16/03/04 18:01:24 INFO AppClient$ClientEndpoint: Connecting to master 
> spark://sunshinee:7077...
> 16/03/04 18:01:24 WARN AppClient$ClientEndpoint: Failed to connect to master 
> sunshinee:7077
> java.io.IOException: Failed to connect to sunshinee:7077
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
>  at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
> at 
> org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv

[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table

2016-03-07 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183996#comment-15183996
 ] 

Suresh Thalamati commented on SPARK-13699:
--

Thank you for providing  the reproduction to the problem I was able to 
reproduce the issue.  Problem is you are trying to overwrite a table that is 
also being read in the data frame. This is not allowed , it should fail with an 
error  (I noticed in some cases I get an error 
org.apache.spark.sql.AnalysisException: Cannot overwrite table `t1` that is 
also being read from).I think  this usage should  raise an error. 

Truncate is any interesting option ,  especially with jdbc data source.  But 
that will not address the problem you are running into, it will run into same 
problem as Overwrite.
 

{code}

scala> tgtFinal.explain
== Physical Plan ==
Union
:- WholeStageCodegen
:  :  +- Project 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,cast(enddate#230
 as string) AS enddate#263,updatedate#231]
:  : +- Filter (currind#228 = N)
:  :+- INPUT
:  +- HiveTableScan 
[enddate#230,updatedate#231,col2#224,col1#223,batchid#227,col3#225,startdate#229,currind#228,col4#226],
 MetastoreRelation default, tgt_table, None
:- WholeStageCodegen
:  :  +- Project 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,cast(enddate#230
 as string) AS enddate#264,updatedate#231]
:  : +- INPUT
:  +- Except
: :- WholeStageCodegen
: :  :  +- Filter (currind#228 = Y)
: :  : +- INPUT
: :  +- HiveTableScan 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231],
 MetastoreRelation default, tgt_table, None
: +- WholeStageCodegen
::  +- Project 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231]
:: +- BroadcastHashJoin [cast(col1#223 as double)], [cast(col1#219 
as double)], Inner, BuildRight, None
:::- Filter (currind#228 = Y)
:::  +- INPUT
::+- INPUT
::- HiveTableScan 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231],
 MetastoreRelation default, tgt_table, None
:+- HiveTableScan [col1#219], MetastoreRelation default, src_table, None
:- WholeStageCodegen
:  :  +- Project [col1#223,col2#224,col3#225,col4#226,batchid#227,UDF(col1#223) 
AS currInd#232,startdate#229,2016-03-07 15:12:20.584 AS 
endDate#265,1457392340584000 AS updateDate#234]
:  : +- BroadcastHashJoin [cast(col1#223 as double)], [cast(col1#219 as 
double)], Inner, BuildRight, None
:  ::- Project 
[col3#225,startdate#229,col2#224,col1#223,batchid#227,col4#226]
:  ::  +- Filter (currind#228 = Y)
:  :: +- INPUT
:  :+- INPUT
:  :- HiveTableScan 
[col3#225,startdate#229,col2#224,col1#223,batchid#227,col4#226,currind#228], 
MetastoreRelation default, tgt_table, None
:  +- HiveTableScan [col1#219], MetastoreRelation default, src_table, None
+- WholeStageCodegen
   :  +- Project [cast(col1#219 as string) AS 
col1#266,col2#220,col3#221,col4#222,UDF(cast(col1#219 as string)) AS 
batchId#235,UDF(cast(col1#219 as string)) AS currInd#236,1457392340584000 AS 
startDate#237,date_format(cast(UDF(cast(col1#219 as string)) as 
timestamp),-MM-dd HH:mm:ss) AS endDate#238,1457392340584000 AS 
updateDate#239]
   : +- INPUT
   +- HiveTableScan [col1#219,col2#220,col3#221,col4#222], MetastoreRelation 
default, src_table, None

scala> 
{code}

> Spark SQL drops the table in "overwrite" mode while writing into table
> --
>
> Key: SPARK-13699
> URL: https://issues.apache.org/jira/browse/SPARK-13699
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Dhaval Modi
> Attachments: stackTrace.txt
>
>
> Hi,
> While writing the dataframe to HIVE table with "SaveMode.Overwrite" option.
> E.g.
> tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table")
> sqlContext drop the table instead of truncating.
> This is causing error while overwriting.
> Adding stacktrace & commands to reproduce the issue,
> Thanks & Regards,
> Dhaval



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13726) Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable

2016-03-07 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183991#comment-15183991
 ] 

Yin Huai edited comment on SPARK-13726 at 3/7/16 11:21 PM:
---

I took a look at the code. I think this change is caused by Session Management 
added in 1.6 (https://issues.apache.org/jira/browse/SPARK-10810). Basically, 
every jdbc session creates its own session. So, it does not see temp table 
registered through df.registerTempTable, which is registered in another 
session. You can set {{spark.sql.hive.thriftServer.singleSession}} to {{true}} 
to change the behavior back.


was (Author: yhuai):
I took a look at the code. I think this change is caused by Session Management 
added in 1.6. Basically, every jdbc session creates its own session. So, it 
does not see temp table registered through df.registerTempTable, which is 
registered in another session. You can set 
{{spark.sql.hive.thriftServer.singleSession}} to {{true}} to change the 
behavior back.

> Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable
> 
>
> Key: SPARK-13726
> URL: https://issues.apache.org/jira/browse/SPARK-13726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Nguyen
>Priority: Blocker
>
> In Spark 1.5.2, DataFrame.registerTempTable works and  
> hiveContext.table(registerTableName) and HiveThriftServer2 see those tables.
> In Spark 1.6.0, hiveContext.table(registerTableName) and HiveThriftServer2 do 
> not see those tables, even though DataFrame.registerTempTable does not return 
> an error.
> Since this feature used to work in Spark 1.5.2, there is existing code that 
> breaks after upgrading to Spark 1.6.0. so this issue is a blocker and urgent. 
> Therefore, please have it fixed asap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13726) Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable

2016-03-07 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183991#comment-15183991
 ] 

Yin Huai commented on SPARK-13726:
--

I took a look at the code. I think this change is caused by Session Management 
added in 1.6. Basically, every jdbc session creates its own session. So, it 
does not see temp table registered through df.registerTempTable, which is 
registered in another session. You can set 
{{spark.sql.hive.thriftServer.singleSession}} to {{true}} to change the 
behavior back.

> Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable
> 
>
> Key: SPARK-13726
> URL: https://issues.apache.org/jira/browse/SPARK-13726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Nguyen
>Priority: Blocker
>
> In Spark 1.5.2, DataFrame.registerTempTable works and  
> hiveContext.table(registerTableName) and HiveThriftServer2 see those tables.
> In Spark 1.6.0, hiveContext.table(registerTableName) and HiveThriftServer2 do 
> not see those tables, even though DataFrame.registerTempTable does not return 
> an error.
> Since this feature used to work in Spark 1.5.2, there is existing code that 
> breaks after upgrading to Spark 1.6.0. so this issue is a blocker and urgent. 
> Therefore, please have it fixed asap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13731) expression evaluation for NaN in select statement

2016-03-07 Thread Ian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian updated SPARK-13731:

Description: 
We are expecting that arithmetic expression a/b should be:
1. returning NaN if a=0 and b=0
2. returning Infinity if a=1 and b=0

Is the expectation reasonable? 
The following is a simple test case snippet that read from storage and evaluate 
arithmetic in select.
It si assuming org.apache.spark.sql.hive.execution.SQLQuerySuite: 
{code}
  test("Expression should be evaluated to Nan/Infinity in Select") {
withTable("testNan") {
  withTempTable("src") {
Seq((1d, 0d), (0d, 0d)).toDF().registerTempTable("src")
sql("CREATE TABLE testNan(a double, b double) STORED AS PARQUET AS 
SELECT * FROM src")
  }

  checkAnswer(sql(
"""
  |SELECT a/b FROM testNan
""".stripMargin),
Seq(
  Row(Double.PositiveInfinity),
  Row(Double.NaN)
)
  )
}
  }


== Physical Plan ==
Project [(a#28 / b#29) AS _c0#30]
+- Scan ParquetRelation: default.testnan[a#28,b#29] InputPaths: 
file:/private/var/folders/dy/19y6pfm92pj9s40mbs8xd9hmgp/T/warehouse--5b617080-e909-4812-90e8-63d2dd0aef5a/testnan
== Results ==
!== Correct Answer - 2 ==   == Spark Answer - 2 ==
![Infinity] [null]
![NaN]  [null]
  
{code}

  was:
We are expecting arithmetic expression a/b should be:
1. returning NaN if a=0 and b=0
2. returning Infinity if a=1 and b=0

Is the expectation reasonable? 
The following is a simple test case snippet that read from storage and evaluate 
arithmetic in select.
It si assuming org.apache.spark.sql.hive.execution.SQLQuerySuite: 
{code}
  test("Expression should be evaluated to Nan/Infinity in Select") {
withTable("testNan") {
  withTempTable("src") {
Seq((1d, 0d), (0d, 0d)).toDF().registerTempTable("src")
sql("CREATE TABLE testNan(a double, b double) STORED AS PARQUET AS 
SELECT * FROM src")
  }

  checkAnswer(sql(
"""
  |SELECT a/b FROM testNan
""".stripMargin),
Seq(
  Row(Double.PositiveInfinity),
  Row(Double.NaN)
)
  )
}
  }


== Physical Plan ==
Project [(a#28 / b#29) AS _c0#30]
+- Scan ParquetRelation: default.testnan[a#28,b#29] InputPaths: 
file:/private/var/folders/dy/19y6pfm92pj9s40mbs8xd9hmgp/T/warehouse--5b617080-e909-4812-90e8-63d2dd0aef5a/testnan
== Results ==
!== Correct Answer - 2 ==   == Spark Answer - 2 ==
![Infinity] [null]
![NaN]  [null]
  
{code}


> expression evaluation for NaN in select statement
> -
>
> Key: SPARK-13731
> URL: https://issues.apache.org/jira/browse/SPARK-13731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Ian
>
> We are expecting that arithmetic expression a/b should be:
> 1. returning NaN if a=0 and b=0
> 2. returning Infinity if a=1 and b=0
> Is the expectation reasonable? 
> The following is a simple test case snippet that read from storage and 
> evaluate arithmetic in select.
> It si assuming org.apache.spark.sql.hive.execution.SQLQuerySuite: 
> {code}
>   test("Expression should be evaluated to Nan/Infinity in Select") {
> withTable("testNan") {
>   withTempTable("src") {
> Seq((1d, 0d), (0d, 0d)).toDF().registerTempTable("src")
> sql("CREATE TABLE testNan(a double, b double) STORED AS PARQUET AS 
> SELECT * FROM src")
>   }
>   checkAnswer(sql(
> """
>   |SELECT a/b FROM testNan
> """.stripMargin),
> Seq(
>   Row(Double.PositiveInfinity),
>   Row(Double.NaN)
> )
>   )
> }
>   }
> == Physical Plan ==
> Project [(a#28 / b#29) AS _c0#30]
> +- Scan ParquetRelation: default.testnan[a#28,b#29] InputPaths: 
> file:/private/var/folders/dy/19y6pfm92pj9s40mbs8xd9hmgp/T/warehouse--5b617080-e909-4812-90e8-63d2dd0aef5a/testnan
> == Results ==
> !== Correct Answer - 2 ==   == Spark Answer - 2 ==
> ![Infinity] [null]
> ![NaN]  [null]
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13665) Initial separation of concerns

2016-03-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13665.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Initial separation of concerns
> --
>
> Key: SPARK-13665
> URL: https://issues.apache.org/jira/browse/SPARK-13665
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
> Fix For: 2.0.0
>
>
> The goal here is to break apart: File Management, code to deal with specific 
> formats and query planning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13725) Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable

2016-03-07 Thread Michael Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183964#comment-15183964
 ] 

Michael Nguyen commented on SPARK-13725:


I typically do not set issues to Blocker. I set this issue to Blocker, because 
these specified APIs used to work in earlier versions of Spark up to 1.5.2, and 
there are existing code that relies on that and now fails because of this issue 
in Spark 1.6.0. 

> Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable
> 
>
> Key: SPARK-13725
> URL: https://issues.apache.org/jira/browse/SPARK-13725
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Spark 1.6.0 with DataFrame.registerTempTable and 
> HiveThriftServer2
>Reporter: Michael Nguyen
>
> In Spark 1.5.2, DataFrame.registerTempTable API works correctly and 
> HiveThriftServer2 sees and returns temp tables that are registered via that 
> API.
> In Spark 1.6.0, that stopped working.  registerTempTable API does not return 
> an error so it is a false positive, and HiveThriftServer2  does not see such 
> tables. And hiveContext.table(registerTableName) indicates it does not see 
> those tables either.
> Is there a temporary work-around solution in Spark 1.6.0 ? When would it be 
> fixed ?
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13733) Support initial weight distribution in personalized PageRank

2016-03-07 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-13733:
-

 Summary: Support initial weight distribution in personalized 
PageRank
 Key: SPARK-13733
 URL: https://issues.apache.org/jira/browse/SPARK-13733
 Project: Spark
  Issue Type: New Feature
  Components: GraphX
Reporter: Xiangrui Meng


It would be nice to support personalized PageRank with an initial weight 
distribution besides a single vertex. It should be easy to modify the current 
implementation to add this support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13726) Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable

2016-03-07 Thread Michael Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183953#comment-15183953
 ] 

Michael Nguyen commented on SPARK-13726:


Thrift server was started with HiveThriftServer2.startWithContext. However, 
this issue is not specified to HiveThriftServer2.startWithContext. The 
preceding cause is that after the table is regisgtered via 
DataFrame.registerTempTable, hiveContext.table(registerTableName) still fails 
because it does not see that table as registered.

> Spark 1.6.0 stopping working for HiveThriftServer2 and registerTempTable
> 
>
> Key: SPARK-13726
> URL: https://issues.apache.org/jira/browse/SPARK-13726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Nguyen
>Priority: Blocker
>
> In Spark 1.5.2, DataFrame.registerTempTable works and  
> hiveContext.table(registerTableName) and HiveThriftServer2 see those tables.
> In Spark 1.6.0, hiveContext.table(registerTableName) and HiveThriftServer2 do 
> not see those tables, even though DataFrame.registerTempTable does not return 
> an error.
> Since this feature used to work in Spark 1.5.2, there is existing code that 
> breaks after upgrading to Spark 1.6.0. so this issue is a blocker and urgent. 
> Therefore, please have it fixed asap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13682) Finalize the public API for FileFormat

2016-03-07 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183949#comment-15183949
 ] 

Reynold Xin commented on SPARK-13682:
-

Most "trait" should probably become "abstract class".


> Finalize the public API for FileFormat
> --
>
> Key: SPARK-13682
> URL: https://issues.apache.org/jira/browse/SPARK-13682
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>
> The current file format interface needs to be cleaned up before its 
> acceptable for public consumption:
>  - Have a version that takes Row and does a conversion, hide the internal API.
>  - Remove bucketing
>  - Remove RDD and the broadcastedConf
>  - Remove SQLContext (maybe include SparkSession?)
>  - Pass a better conf object



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13596) Move misc top-level build files into appropriate subdirs

2016-03-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13596.
-
   Resolution: Fixed
 Assignee: Sean Owen
Fix Version/s: 2.0.0

> Move misc top-level build files into appropriate subdirs
> 
>
> Key: SPARK-13596
> URL: https://issues.apache.org/jira/browse/SPARK-13596
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
> Fix For: 2.0.0
>
>
> I'd like to file away a bunch of misc files that are in the top level of the 
> project in order to further tidy the build for 2.0.0. See also SPARK-13529, 
> SPARK-13548.
> Some of these may turn out to be difficult or impossible to move.
> I'd ideally like to move these files into {{build/}}:
> - {{.rat-excludes}}
> - {{checkstyle.xml}}
> - {{checkstyle-suppressions.xml}}
> - {{pylintrc}}
> - {{scalastyle-config.xml}}
> - {{tox.ini}}
> - {{project/}} (or does SBT need this in the root?)
> And ideally, these would go under {{dev/}}
> - {{make-distribution.sh}}
> And remove these
> - {{sbt/sbt}} (backwards-compatible location of {{build/sbt}} right?)
> Edited to add: apparently this can go in {{.github}} now:
> - {{CONTRIBUTING.md}}
> Other files in the top level seem to need to be there, like {{README.md}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13692) Fix trivial Coverity/Checkstyle defects

2016-03-07 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-13692:
--
Description: 
This issue fixes the following potential bugs and Java coding style detected by 
Coverity and Checkstyle.

  * Implement both null and type checking in equals functions.
  * Fix wrong type casting logic in SimpleJavaBean2.equals.
  * Add `implement Cloneable` to `UTF8String` and `SortedIterator`.
  * Remove dereferencing before null check in `AbstractBytesToBytesMapSuite`.
  * Fix coding style: Add '{}' to single `for` statement in mllib examples.
  * Remove unused imports in `ColumnarBatch` and `JavaKinesisStreamSuite.java`.
  * Remove unused fields in `ChunkFetchIntegrationSuite`.
  * Add `stop()` to prevent resource leak.


Please note that the last two checkstyle errors exist on newly added commits 
after [SPARK-13583].

  was:
This issue fixes the following potential bugs and Java coding style detected by 
Coverity and Checkstyle.

  * Implement both null and type checking in equals functions.
  * Fix wrong type casting logic in SimpleJavaBean2.equals.
  * Add `implement Cloneable` to `UTF8String` and `SortedIterator`.
  * Remove dereferencing before null check in `AbstractBytesToBytesMapSuite`.
  * Fix coding style: Add '{}' to single `for` statement in mllib examples.
  * Remove unused imports in `ColumnarBatch`.
  * Remove unused fields in `ChunkFetchIntegrationSuite`.
  * Add `stop()` to prevent resource leak.


Please note that the last two checkstyle errors exist on newly added commits 
after [SPARK-13583].


> Fix trivial Coverity/Checkstyle defects
> ---
>
> Key: SPARK-13692
> URL: https://issues.apache.org/jira/browse/SPARK-13692
> Project: Spark
>  Issue Type: Bug
>  Components: Examples, Spark Core, SQL
>Reporter: Dongjoon Hyun
>Priority: Trivial
>
> This issue fixes the following potential bugs and Java coding style detected 
> by Coverity and Checkstyle.
>   * Implement both null and type checking in equals functions.
>   * Fix wrong type casting logic in SimpleJavaBean2.equals.
>   * Add `implement Cloneable` to `UTF8String` and `SortedIterator`.
>   * Remove dereferencing before null check in `AbstractBytesToBytesMapSuite`.
>   * Fix coding style: Add '{}' to single `for` statement in mllib examples.
>   * Remove unused imports in `ColumnarBatch` and 
> `JavaKinesisStreamSuite.java`.
>   * Remove unused fields in `ChunkFetchIntegrationSuite`.
>   * Add `stop()` to prevent resource leak.
> Please note that the last two checkstyle errors exist on newly added commits 
> after [SPARK-13583].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13638) Support for saving with a quote mode

2016-03-07 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-13638:
-
Description: 
https://github.com/databricks/spark-csv/pull/254

tobithiel reported this.

{quote}
I'm dealing with some messy csv files and being able to just quote all fields 
is very useful,
so that other applications don't misunderstand the file because of some sketchy 
characters
{quote}

When writing there are several quote modes in apache commons csv. (See 
https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/QuoteMode.html)

This might have to be supported.

However, it looks univocity parser used for writing (it looks currently only 
this library is supported) does not support this quote mode. I think we can 
drop this backwards compatibility if we are not going to add apache commons csv.

This is a reminder that it might break backwards compatibility for the options, 
{{quoteMode}}.

  was:
https://github.com/databricks/spark-csv/pull/254

tobithiel reported this.

{quote}
I'm dealing with some messy csv files and being able to just quote all fields 
is very useful,
so that other applications don't misunderstand the file because of some sketchy 
characters
{quote}

When writing there are several quote modes in apache commons csv. (See 
https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/QuoteMode.html)

This might have to be supported.

However, it looks univocity parser used for writing (it looks currently only 
this library is supported) does not support this quote mode. I think we can 
drop this backwards compatibility if we are not going to add apache commons csv.

This is a reminder that it will break backwards compatibility for the options, 
{{quoteMode}}.


> Support for saving with a quote mode
> 
>
> Key: SPARK-13638
> URL: https://issues.apache.org/jira/browse/SPARK-13638
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> https://github.com/databricks/spark-csv/pull/254
> tobithiel reported this.
> {quote}
> I'm dealing with some messy csv files and being able to just quote all fields 
> is very useful,
> so that other applications don't misunderstand the file because of some 
> sketchy characters
> {quote}
> When writing there are several quote modes in apache commons csv. (See 
> https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/QuoteMode.html)
> This might have to be supported.
> However, it looks univocity parser used for writing (it looks currently only 
> this library is supported) does not support this quote mode. I think we can 
> drop this backwards compatibility if we are not going to add apache commons 
> csv.
> This is a reminder that it might break backwards compatibility for the 
> options, {{quoteMode}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13732) Remove projectList from Windows

2016-03-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13732:


Assignee: Apache Spark

> Remove projectList from Windows
> ---
>
> Key: SPARK-13732
> URL: https://issues.apache.org/jira/browse/SPARK-13732
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> projectList is useless. Remove it from the class Window. It simplifies the 
> codes in Analyzer and Optimizer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >