[jira] [Commented] (SPARK-12715) Improve test coverage

2016-01-30 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125247#comment-15125247
 ] 

Reynold Xin commented on SPARK-12715:
-

[~davies]  are there specific things you have in mind?

cc [~hvanhovell]

> Improve test coverage
> -
>
> Key: SPARK-12715
> URL: https://issues.apache.org/jira/browse/SPARK-12715
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Davies Liu
>
> We could bring in all test parser test cases into Spark, to make sure we will 
> not break compatibility with Hive (we could do more and skip some of them 
> that does not make sense).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12772) Better error message for parsing failure?

2016-01-30 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125246#comment-15125246
 ] 

Reynold Xin commented on SPARK-12772:
-

cc [~hvanhovell] / [~viirya] any idea about this one?



> Better error message for parsing failure?
> -
>
> Key: SPARK-12772
> URL: https://issues.apache.org/jira/browse/SPARK-12772
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> {code}
> scala> sql("select case if(true, 'one', 'two')").explain(true)
> org.apache.spark.sql.AnalysisException: org.antlr.runtime.EarlyExitException
> line 1:34 required (...)+ loop did not match anything at input '' in 
> case expression
> ; line 1 pos 34
>   at 
> org.apache.spark.sql.catalyst.parser.ParseErrorReporter.throwError(ParseDriver.scala:140)
>   at 
> org.apache.spark.sql.catalyst.parser.ParseErrorReporter.throwError(ParseDriver.scala:129)
>   at 
> org.apache.spark.sql.catalyst.parser.ParseDriver$.parse(ParseDriver.scala:77)
>   at 
> org.apache.spark.sql.catalyst.CatalystQl.createPlan(CatalystQl.scala:53)
>   at 
> org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
>   at 
> org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
> {code}
> Is there a way to say something better other than "required (...)+ loop did 
> not match anything at input"?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12689) Migrate DDL parsing to the newly absorbed parser

2016-01-30 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-12689.
-
   Resolution: Fixed
 Assignee: Liang-Chi Hsieh
Fix Version/s: 2.0.0

> Migrate DDL parsing to the newly absorbed parser
> 
>
> Key: SPARK-12689
> URL: https://issues.apache.org/jira/browse/SPARK-12689
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Liang-Chi Hsieh
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13070) Points out which physical file is the trouble maker when Parquet schema merging fails

2016-01-30 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13070.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Points out which physical file is the trouble maker when Parquet schema 
> merging fails
> -
>
> Key: SPARK-13070
> URL: https://issues.apache.org/jira/browse/SPARK-13070
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Minor
> Fix For: 2.0.0
>
>
> As a user, I'd like to know which physical file is the trouble maker when 
> Parquet schema merging fails. Currently, we only have an error message like 
> this:
> {quote}
> Failed to merge incompatible data types LongType and IntegerType
> {quote}
> Would be nice to add the file path and the actual schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12951) Support spilling in generate aggregate

2016-01-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125245#comment-15125245
 ] 

Apache Spark commented on SPARK-12951:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/10998

> Support spilling in generate aggregate
> --
>
> Key: SPARK-12951
> URL: https://issues.apache.org/jira/browse/SPARK-12951
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7009) Build assembly JAR via ant to avoid zip64 problems

2016-01-30 Thread Zhan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125205#comment-15125205
 ] 

Zhan Zhang commented on SPARK-7009:
---

Yes. This one is obsoleted.

> Build assembly JAR via ant to avoid zip64 problems
> --
>
> Key: SPARK-7009
> URL: https://issues.apache.org/jira/browse/SPARK-7009
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.0
> Environment: Java 7+
>Reporter: Steve Loughran
> Attachments: check_spark_python.sh
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> SPARK-1911 shows the problem that JDK7+ is using zip64 to build large JARs; a 
> format incompatible with Java and pyspark.
> Provided the total number of .class files+resources is <64K, ant can be used 
> to make the final JAR instead, perhaps by unzipping the maven-generated JAR 
> then rezipping it with zip64=never, before publishing the artifact via maven.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13105) Spark 1.6 and earlier should reject NATURAL JOIN queries instead of returning wrong answers

2016-01-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125199#comment-15125199
 ] 

Apache Spark commented on SPARK-13105:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/10997

> Spark 1.6 and earlier should reject NATURAL JOIN queries instead of returning 
> wrong answers
> ---
>
> Key: SPARK-13105
> URL: https://issues.apache.org/jira/browse/SPARK-13105
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.2, 1.6.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> In Spark 1.6 and earlier, Spark SQL does not support {{NATURAL JOIN}} 
> queries. However, its SQL parser does not consider {{NATURAL}} to be a 
> reserved word, which causes natural joins to be parsed as regular joins where 
> the left table has been aliased. For instance,
> {code}
> SELECT * FROM foo NATURAL JOIN bar
> {code}
> gets interpreted as "foo JOIN bar" where "foo" is aliased to "natural".
> Rather than doing this, which leads to confusing / wrong results for users 
> who expect NATURAL JOIN behavior, Spark should immediately reject these 
> queries at analysis time and should provide an informative error message.
> We're going to add natural join support in Spark 2.0, but for earlier 
> versions we should add a bugfix to throw errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13105) Spark 1.6 and earlier should reject NATURAL JOIN queries instead of returning wrong answers

2016-01-30 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-13105:
--

 Summary: Spark 1.6 and earlier should reject NATURAL JOIN queries 
instead of returning wrong answers
 Key: SPARK-13105
 URL: https://issues.apache.org/jira/browse/SPARK-13105
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0, 1.5.2, 1.4.1
Reporter: Josh Rosen
Assignee: Josh Rosen


In Spark 1.6 and earlier, Spark SQL does not support {{NATURAL JOIN}} queries. 
However, its SQL parser does not consider {{NATURAL}} to be a reserved word, 
which causes natural joins to be parsed as regular joins where the left table 
has been aliased. For instance,

{code}
SELECT * FROM foo NATURAL JOIN bar
{code}

gets interpreted as "foo JOIN bar" where "foo" is aliased to "natural".

Rather than doing this, which leads to confusing / wrong results for users who 
expect NATURAL JOIN behavior, Spark should immediately reject these queries at 
analysis time and should provide an informative error message.

We're going to add natural join support in Spark 2.0, but for earlier versions 
we should add a bugfix to throw errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7009) Build assembly JAR via ant to avoid zip64 problems

2016-01-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125153#comment-15125153
 ] 

Josh Rosen commented on SPARK-7009:
---

I believe that this will be obsoleted by SPARK-11157, no?

> Build assembly JAR via ant to avoid zip64 problems
> --
>
> Key: SPARK-7009
> URL: https://issues.apache.org/jira/browse/SPARK-7009
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.0
> Environment: Java 7+
>Reporter: Steve Loughran
> Attachments: check_spark_python.sh
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> SPARK-1911 shows the problem that JDK7+ is using zip64 to build large JARs; a 
> format incompatible with Java and pyspark.
> Provided the total number of .class files+resources is <64K, ant can be used 
> to make the final JAR instead, perhaps by unzipping the maven-generated JAR 
> then rezipping it with zip64=never, before publishing the artifact via maven.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark

2016-01-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125152#comment-15125152
 ] 

Josh Rosen commented on SPARK-6305:
---

Hey [~srowen], what do you think the next steps are in evaluating how to handle 
log4j 2.x in Spark? Just pinging this now since I'm trying to resolve major 
build/dep changes earlier in the 2.0.0 cycle.

> Add support for log4j 2.x to Spark
> --
>
> Key: SPARK-6305
> URL: https://issues.apache.org/jira/browse/SPARK-6305
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Tal Sliwowicz
>Priority: Minor
>
> log4j 2 requires replacing the slf4j binding and adding the log4j jars in the 
> classpath. Since there are shaded jars, it must be done during the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12154) Upgrade to Jersey 2

2016-01-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125151#comment-15125151
 ] 

Josh Rosen commented on SPARK-12154:


Since we're in the middle of the 2.0.0 development cycle right now, it seems 
like it would be a good time to revisit upgrading to Jersey 2. [~mcheah], would 
you or someone else be interested in helping to scope out this task to figure 
out what it's going to require?

> Upgrade to Jersey 2
> ---
>
> Key: SPARK-12154
> URL: https://issues.apache.org/jira/browse/SPARK-12154
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Spark Core
>Affects Versions: 1.5.2
>Reporter: Matt Cheah
>
> Fairly self-explanatory, Jersey 1 is a bit old and could use an upgrade. 
> Library conflicts for Jersey are difficult to workaround - see discussion on 
> SPARK-11081. It's easier to upgrade Jersey entirely, but we should target 
> Spark 2.0 since this may be a break for users who were using Jersey 1 in 
> their Spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11416) Upgrade kryo package to version 3.0

2016-01-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125149#comment-15125149
 ] 

Josh Rosen commented on SPARK-11416:


Since we're in the middle of 2.0.0 development, now seems like a good time to 
revisit upgrading to Kryo 3. For those of you more familiar with this issue, 
could you help me to break down this story into some smaller subtasks so we can 
make progress? Does this require coordination with any third parties? What are 
the changes we need in Spark?

> Upgrade kryo package to version 3.0
> ---
>
> Key: SPARK-11416
> URL: https://issues.apache.org/jira/browse/SPARK-11416
> Project: Spark
>  Issue Type: Wish
>  Components: Build
>Affects Versions: 1.5.1
>Reporter: Hitoshi Ozawa
>
> Would like to have Apache Spark upgrade kryo package from 2.x (current) to 
> 3.x.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7019) Build docs on doc changes

2016-01-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125150#comment-15125150
 ] 

Josh Rosen commented on SPARK-7019:
---

This would be great to do but is somewhat blocked at the moment by the fact 
that the doc building dependencies (some Ruby stuff) aren't installed on all 
Jenkins workers.

> Build docs on doc changes
> -
>
> Key: SPARK-7019
> URL: https://issues.apache.org/jira/browse/SPARK-7019
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Reporter: Brennon York
>
> Currently when a pull request changes the {{docs/}} directory, the docs 
> aren't actually built. When a PR is submitted the {{git}} history should be 
> checked to see if any doc changes were made and, if so, properly build the 
> docs and report any issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12822) Change default build to Hadoop 2.7

2016-01-30 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-12822.

Resolution: Won't Fix

It sounds like this is a clear "Won't Fix" as long as we continue to support 
Hadoop 2.2, so I'm going to close this for now. We can re-open if this decision 
changes.

> Change default build to Hadoop 2.7
> --
>
> Key: SPARK-12822
> URL: https://issues.apache.org/jira/browse/SPARK-12822
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-01-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125147#comment-15125147
 ] 

Josh Rosen commented on SPARK-11157:


I'd love to start making progress towards removing the assemblies. Before we do 
so, though, I think there are a few subtasks / obstacles that we need to clear 
first:

- First, I think we should just completely remove the assembly rather than 
giving both assembly and non-assembly options. Every additional option that we 
provide / support adds lots of maintenance burden and it would be nice to 
standardize on a single supported distribution technique.
- Prior to removing the assemblies, it would be great if we could reconfigure 
our tests to not depend on the full assembly JAR in order to run. We already 
have {{SPARK_PREPEND_CLASSPATH}} today, so this might be as simple as making 
that behavior the default and reconfiguring our test scripts to skip the 
assembly step.
- Building up a {{-classpath}} argument that lists hundreds of JARs is going to 
be a debugging nightmare (lots of tools truncate process arguments past some 
limit, etc.), so it would be good to investigate other techniques that we can 
use to pass the classpath to {{java}} without bloating the CLI (maybe using an 
environment variable or some file or something?).
- This is going to require changes to Launcher, shell scripts, and a few other 
places; it would be good to scope out these changes to estimate how much work 
it's going to be.

[~vanzin], are there any other obvious subtasks that I'm not thinking of? I'd 
like to try to see whether we can break down this big task and scope out some 
smaller pieces so we can make incremental progress and get this finished well 
in time for 2.0.0 so we have lots of time to test.

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-01-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125143#comment-15125143
 ] 

Josh Rosen commented on SPARK-7481:
---

How does this proposal change if we just remove the assembly and ship a folder 
of JARs, as has been proposed elsewhere by [~vanzin]? Does that render this 
proposal moot?

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> 
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6029) Unshaded "clearspring" classpath leakage + excluded fastutil interferes with apps using clearspring

2016-01-30 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-6029.
---
Resolution: Incomplete

Resolving as "incomplete", since it's not clear whether this issue is still 
valid. If it is, please comment and we can re-open and re-scope. Thanks!

> Unshaded "clearspring" classpath leakage + excluded fastutil interferes with 
> apps using clearspring 
> 
>
> Key: SPARK-6029
> URL: https://issues.apache.org/jira/browse/SPARK-6029
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.2.1
>Reporter: Jim Kleckner
>Priority: Minor
>
> Spark includes the clearspring analytics package but intentionally excludes 
> the dependencies of the fastutil package.
> Spark includes parquet-column which includes fastutil and relocates it under 
> parquet/ but creates a shaded jar file which is incomplete because it shades 
> out some of the fastutil classes, notably Long2LongOpenHashMap, which is 
> present in the fastutil jar file that parquet-column is referencing.
> We are using more of the clearspring classes (e.g. QDigest) and those do 
> depend on missing fastutil classes like Long2LongOpenHashMap.
> Even though I add them to our assembly jar file, the class loader finds the 
> spark assembly and we get runtime class loader errors when we try to use it.
> The 
> [documentaion|http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment]
>  and possibly related issue 
> [SPARK-939|https://issues.apache.org/jira/browse/SPARK-939] suggest arguments 
> that I tried with spark-submit:
> {code}
> --conf spark.driver.userClassPathFirst=true \
> --conf spark.executor.userClassPathFirst=true
> {code}
> but we still get the class not found error.
> Could this be a bug with {{userClassPathFirst=true}}?  i.e. should it work?
> In any case, would it be reasonable to not exclude the "fastutil" 
> dependencies?
> See email discussion 
> [here|http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-Spark-excludes-quot-fastutil-quot-dependencies-we-need-tt21812.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5330) Core | Scala 2.11 | Transitive dependency on com.fasterxml.jackson.core :jackson-core:2.3.1 causes compatibility issues

2016-01-30 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-5330.
---
Resolution: Incomplete

I'm going to resolve this as "incomplete" since I'm not sure whether it's still 
valid and there hasn't been any reply to Sean's questions. If this is still a 
valid issue, please comment and we can re-open.

> Core | Scala 2.11 | Transitive dependency on com.fasterxml.jackson.core 
> :jackson-core:2.3.1 causes compatibility issues
> ---
>
> Key: SPARK-5330
> URL: https://issues.apache.org/jira/browse/SPARK-5330
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.2.0
>Reporter: Aniket Bhatnagar
>Priority: Minor
>
> Spark Transitive depends on com.fasterxml.jackson.core :jackson-core:2.3.1. 
> Users of jackson-module-scala had to to depend on the same version to avoid 
> any class compatibility issues. However, since scala 2.11, 
> jackson-module-scala is no longer published for version 2.3.1. Since the 
> version 2.3.1 is quiet old, perhaps we should investigate upgrading to latest 
> jackson-core. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-12261) pyspark crash for large dataset

2016-01-30 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reopened SPARK-12261:


> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala

2016-01-30 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13100.
-
   Resolution: Fixed
 Assignee: Yang Wang
Fix Version/s: 2.0.0

> improving the performance of stringToDate method in DateTimeUtils.scala
> ---
>
> Key: SPARK-13100
> URL: https://issues.apache.org/jira/browse/SPARK-13100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.0
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: screenshot-1.png
>
>
> In the stringToDate method in DateTimeUtils.scala, in order to create a 
> Calendar instance we create a brand new TimeZone instance every time by 
> calling TimeZone.getTimeZone("GMT"). In jdk1.7, however,  this method is 
> synchronized, thus such an approach can cause significant performance loss. 
> Since the same time zone is used each time we call that method, I think we 
> should create a val in the DateTimeUtils singleton object to hold that 
> TimeZone, and use it every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-01-30 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-13101:
---
Target Version/s: 1.6.1
Priority: Blocker  (was: Major)

I'm temporarily marking this as a 1.6.1 blocker so that we make sure to 
investigate and triage before cutting an RC. /cc [~marmbrus]

> Dataset complex types mapping to DataFrame  (element nullability) mismatch
> --
>
> Key: SPARK-13101
> URL: https://issues.apache.org/jira/browse/SPARK-13101
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Deenar Toraskar
>Priority: Blocker
>
> There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By 
> default a scala Seq[Double] is mapped by Spark as an ArrayType with nullable 
> element
>  |-- valuations: array (nullable = true)
>  ||-- element: double (containsNull = true)
> This could be read back to as a Dataset in Spark 1.6.0
> val df = sqlContext.table("valuations").as[Valuation]
> But with Spark 1.6.1 the same fails with
> val df = sqlContext.table("valuations").as[Valuation]
> org.apache.spark.sql.AnalysisException: cannot resolve 'cast(valuations as 
> array)' due to data type mismatch: cannot cast 
> ArrayType(DoubleType,true) to ArrayType(DoubleType,false);
> Here's the classes I am using
> case class Valuation(tradeId : String,
>  counterparty: String,
>  nettingAgreement: String,
>  wrongWay: Boolean,
>  valuations : Seq[Double], /* one per scenario */
>  timeInterval: Int,
>  jobId: String)  /* used for hdfs partitioning */
> val vals : Seq[Valuation] = Seq()
> val valsDF = sqlContext.sparkContext.parallelize(vals).toDF
> valsDF.write.partitionBy("jobId").mode(SaveMode.Overwrite).saveAsTable("valuations")
> even the following gives the same result
> val valsDF = vals.toDS.toDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-01-30 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-13101:
---
Fix Version/s: (was: 1.6.1)

> Dataset complex types mapping to DataFrame  (element nullability) mismatch
> --
>
> Key: SPARK-13101
> URL: https://issues.apache.org/jira/browse/SPARK-13101
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Deenar Toraskar
>Priority: Blocker
>
> There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By 
> default a scala Seq[Double] is mapped by Spark as an ArrayType with nullable 
> element
>  |-- valuations: array (nullable = true)
>  ||-- element: double (containsNull = true)
> This could be read back to as a Dataset in Spark 1.6.0
> val df = sqlContext.table("valuations").as[Valuation]
> But with Spark 1.6.1 the same fails with
> val df = sqlContext.table("valuations").as[Valuation]
> org.apache.spark.sql.AnalysisException: cannot resolve 'cast(valuations as 
> array)' due to data type mismatch: cannot cast 
> ArrayType(DoubleType,true) to ArrayType(DoubleType,false);
> Here's the classes I am using
> case class Valuation(tradeId : String,
>  counterparty: String,
>  nettingAgreement: String,
>  wrongWay: Boolean,
>  valuations : Seq[Double], /* one per scenario */
>  timeInterval: Int,
>  jobId: String)  /* used for hdfs partitioning */
> val vals : Seq[Valuation] = Seq()
> val valsDF = sqlContext.sparkContext.parallelize(vals).toDF
> valsDF.write.partitionBy("jobId").mode(SaveMode.Overwrite).saveAsTable("valuations")
> even the following gives the same result
> val valsDF = vals.toDS.toDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13104) Spark Metrics currently does not return executors hostname

2016-01-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125083#comment-15125083
 ] 

Josh Rosen commented on SPARK-13104:


Which Spark version? And which metrics? TaskMetrics? The Codahale metrics?

> Spark Metrics currently does not return executors hostname 
> ---
>
> Key: SPARK-13104
> URL: https://issues.apache.org/jira/browse/SPARK-13104
> Project: Spark
>  Issue Type: Question
>Reporter: Karthik
>Priority: Critical
>  Labels: executor, executorId, graphite, hostname, metrics
>
> We been using Spark Metrics and porting the data to InfluxDB using the 
> Graphite sink that is available in Spark. From what I can see, it only 
> provides he executorId and not the executor hostname. With each spark job, 
> the executorID changes. Is there any way to find the hostname based on the 
> executorID?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13085) Add scalastyle command used in build testing

2016-01-30 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125067#comment-15125067
 ] 

Marcelo Vanzin commented on SPARK-13085:


No harm is keeping this open until we can upgrade scalastyle.

> Add scalastyle command used in build testing
> 
>
> Key: SPARK-13085
> URL: https://issues.apache.org/jira/browse/SPARK-13085
> Project: Spark
>  Issue Type: Wish
>  Components: Build, Tests
>Reporter: Charles Allen
>
> As an occasional or new contributor, it is easy to screw up scala style. But 
> looking at the output logs (for example 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50300/consoleFull
>  ) it is not obvious how to fix the scala style tests, even when reading the 
> scala style guide.
> {code}
> 
> Running Scala style checks
> 
> Scalastyle checks failed at following occurrences:
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala:22:0:
>  import.ordering.wrongOrderInGroup.message
> [error] (core/compile:scalastyle) errors exist
> [error] Total time: 9 s, completed Jan 28, 2016 2:11:00 PM
> [error] running 
> /home/jenkins/workspace/SparkPullRequestBuilder/dev/lint-scala ; received 
> return code 1
> {code}
> This ask is that the command used to check scalastyle is presented in the log 
> so a developer does not have to wait for the build process to check if a pull 
> request should pass scala style checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13104) Spark Metrics currently does not return executors hostname

2016-01-30 Thread Karthik (JIRA)
Karthik created SPARK-13104:
---

 Summary: Spark Metrics currently does not return executors 
hostname 
 Key: SPARK-13104
 URL: https://issues.apache.org/jira/browse/SPARK-13104
 Project: Spark
  Issue Type: Question
Reporter: Karthik
Priority: Critical


We been using Spark Metrics and porting the data to InfluxDB using the Graphite 
sink that is available in Spark. From what I can see, it only provides he 
executorId and not the executor hostname. With each spark job, the executorID 
changes. Is there any way to find the hostname based on the executorID?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13103) HashTF dosn't count TF correctly

2016-01-30 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124937#comment-15124937
 ] 

yuhao yang commented on SPARK-13103:


Thanks for finding this. I'm not sure what's the historical reason, yet it's 
not common that HashingTF in Python was implemented independently from the 
Scala version.







> HashTF dosn't count TF correctly
> 
>
> Key: SPARK-13103
> URL: https://issues.apache.org/jira/browse/SPARK-13103
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.0
> Environment: Ubuntu 14.04
> Python 3.4.3
>Reporter: Louis Liu
>
> I wrote a Python program to calculate frequencies of n-gram sequences with 
> HashTF.
> But it generate a strange output. It found more "一一下嗎" than "一一下".
> HashTF gets words' index with hash()
> But hashes of some Chinese words are negative.
> Ex:
> >>> hash('一一下嗎')
> -6433835193350070115
> >>> hash('一一下')
> -5938108283593463272



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13103) HashTF dosn't count TF correctly

2016-01-30 Thread Louis Liu (JIRA)
Louis Liu created SPARK-13103:
-

 Summary: HashTF dosn't count TF correctly
 Key: SPARK-13103
 URL: https://issues.apache.org/jira/browse/SPARK-13103
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.6.0
 Environment: Ubuntu 14.04
Python 3.4.3
Reporter: Louis Liu


I wrote a Python program to calculate frequencies of n-gram sequences with 
HashTF.
But it generate a strange output. It found more "一一下嗎" than "一一下".

HashTF gets words' index with hash()
But hashes of some Chinese words are negative.
Ex:
>>> hash('一一下嗎')
-6433835193350070115
>>> hash('一一下')
-5938108283593463272



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13089) spark.ml Naive Bayes user guide

2016-01-30 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124889#comment-15124889
 ] 

yuhao yang commented on SPARK-13089:


I'll start on this.

> spark.ml Naive Bayes user guide
> ---
>
> Key: SPARK-13089
> URL: https://issues.apache.org/jira/browse/SPARK-13089
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Add section in ml-classification.md for NaiveBayes DataFrame-based API, plus 
> example code (using include_example to clip code from examples/ folder files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13099) ccjlbr

2016-01-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13099.
---
Resolution: Invalid

Assuming this was a typo

> ccjlbr
> --
>
> Key: SPARK-13099
> URL: https://issues.apache.org/jira/browse/SPARK-13099
> Project: Spark
>  Issue Type: Bug
>Reporter: Michael Armbrust
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala

2016-01-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13100:
--
  Labels:   (was: performance)
Priority: Minor  (was: Major)

> improving the performance of stringToDate method in DateTimeUtils.scala
> ---
>
> Key: SPARK-13100
> URL: https://issues.apache.org/jira/browse/SPARK-13100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.0
>Reporter: Yang Wang
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> In the stringToDate method in DateTimeUtils.scala, in order to create a 
> Calendar instance we create a brand new TimeZone instance every time by 
> calling TimeZone.getTimeZone("GMT"). In jdk1.7, however,  this method is 
> synchronized, thus such an approach can cause significant performance loss. 
> Since the same time zone is used each time we call that method, I think we 
> should create a val in the DateTimeUtils singleton object to hold that 
> TimeZone, and use it every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response

2016-01-30 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-13102:
--
Attachment: dag info is blank.png

Using IE11, I click the  "DAG Visualization" in StagesPage, but get nothing.

> Run query using ThriftServer, and open web using IE11, i  click ”+detail" in 
> SQLPage, but not response
> --
>
> Key: SPARK-13102
> URL: https://issues.apache.org/jira/browse/SPARK-13102
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: KaiXinXIaoLei
>  Labels: UI
> Fix For: 2.0.0
>
> Attachments: dag info is blank.png, details in SQLPage.png
>
>
> I run query using ThriftServer, and open web using IE11. Then i  click 
> ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in 
> StagesPage, but get nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response

2016-01-30 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-13102:
--
Attachment: details in SQLPage.png

I click “+detais" in SQLpage, but it has no response.

> Run query using ThriftServer, and open web using IE11, i  click ”+detail" in 
> SQLPage, but not response
> --
>
> Key: SPARK-13102
> URL: https://issues.apache.org/jira/browse/SPARK-13102
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: KaiXinXIaoLei
>  Labels: UI
> Fix For: 2.0.0
>
> Attachments: details in SQLPage.png
>
>
> I run query using ThriftServer, and open web using IE11. Then i  click 
> ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in 
> StagesPage, but get nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response

2016-01-30 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-13102:
--
Component/s: Web UI

> Run query using ThriftServer, and open web using IE11, i  click ”+detail" in 
> SQLPage, but not response
> --
>
> Key: SPARK-13102
> URL: https://issues.apache.org/jira/browse/SPARK-13102
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: KaiXinXIaoLei
>  Labels: UI
> Fix For: 2.0.0
>
>
> I run query using ThriftServer, and open web using IE11. Then i  click 
> ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in 
> StagesPage, but get nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response

2016-01-30 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-13102:
--
Fix Version/s: 2.0.0

> Run query using ThriftServer, and open web using IE11, i  click ”+detail" in 
> SQLPage, but not response
> --
>
> Key: SPARK-13102
> URL: https://issues.apache.org/jira/browse/SPARK-13102
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: KaiXinXIaoLei
>  Labels: UI
> Fix For: 2.0.0
>
>
> I run query using ThriftServer, and open web using IE11. Then i  click 
> ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in 
> StagesPage, but get nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response

2016-01-30 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-13102:
--
Affects Version/s: 1.6.0

> Run query using ThriftServer, and open web using IE11, i  click ”+detail" in 
> SQLPage, but not response
> --
>
> Key: SPARK-13102
> URL: https://issues.apache.org/jira/browse/SPARK-13102
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: KaiXinXIaoLei
>  Labels: UI
> Fix For: 2.0.0
>
>
> I run query using ThriftServer, and open web using IE11. Then i  click 
> ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in 
> StagesPage, but get nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response

2016-01-30 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-13102:
--
Labels: UI  (was: )

> Run query using ThriftServer, and open web using IE11, i  click ”+detail" in 
> SQLPage, but not response
> --
>
> Key: SPARK-13102
> URL: https://issues.apache.org/jira/browse/SPARK-13102
> Project: Spark
>  Issue Type: Bug
>Reporter: KaiXinXIaoLei
>  Labels: UI
>
> I run query using ThriftServer, and open web using IE11. Then i  click 
> ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in 
> StagesPage, but get nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response

2016-01-30 Thread KaiXinXIaoLei (JIRA)
KaiXinXIaoLei created SPARK-13102:
-

 Summary: Run query using ThriftServer, and open web using IE11, i  
click ”+detail" in SQLPage, but not response
 Key: SPARK-13102
 URL: https://issues.apache.org/jira/browse/SPARK-13102
 Project: Spark
  Issue Type: Bug
Reporter: KaiXinXIaoLei


I run query using ThriftServer, and open web using IE11. Then i  click 
”+detail" in SQLPage, but not response. And I click "DAG Visualization " in 
StagesPage, but get nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-01-30 Thread Deenar Toraskar (JIRA)
Deenar Toraskar created SPARK-13101:
---

 Summary: Dataset complex types mapping to DataFrame  (element 
nullability) mismatch
 Key: SPARK-13101
 URL: https://issues.apache.org/jira/browse/SPARK-13101
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.1
Reporter: Deenar Toraskar
 Fix For: 1.6.1


There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By 
default a scala Seq[Double] is mapped by Spark as an ArrayType with nullable 
element

 |-- valuations: array (nullable = true)
 ||-- element: double (containsNull = true)

This could be read back to as a Dataset in Spark 1.6.0

val df = sqlContext.table("valuations").as[Valuation]

But with Spark 1.6.1 the same fails with
val df = sqlContext.table("valuations").as[Valuation]

org.apache.spark.sql.AnalysisException: cannot resolve 'cast(valuations as 
array)' due to data type mismatch: cannot cast 
ArrayType(DoubleType,true) to ArrayType(DoubleType,false);

Here's the classes I am using

case class Valuation(tradeId : String,
 counterparty: String,
 nettingAgreement: String,
 wrongWay: Boolean,
 valuations : Seq[Double], /* one per scenario */
 timeInterval: Int,
 jobId: String)  /* used for hdfs partitioning */

val vals : Seq[Valuation] = Seq()
val valsDF = sqlContext.sparkContext.parallelize(vals).toDF
valsDF.write.partitionBy("jobId").mode(SaveMode.Overwrite).saveAsTable("valuations")

even the following gives the same result
val valsDF = vals.toDS.toDF




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala

2016-01-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13100:


Assignee: (was: Apache Spark)

> improving the performance of stringToDate method in DateTimeUtils.scala
> ---
>
> Key: SPARK-13100
> URL: https://issues.apache.org/jira/browse/SPARK-13100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.0
>Reporter: Yang Wang
>  Labels: performance
> Attachments: screenshot-1.png
>
>
> In the stringToDate method in DateTimeUtils.scala, in order to create a 
> Calendar instance we create a brand new TimeZone instance every time by 
> calling TimeZone.getTimeZone("GMT"). In jdk1.7, however,  this method is 
> synchronized, thus such an approach can cause significant performance loss. 
> Since the same time zone is used each time we call that method, I think we 
> should create a val in the DateTimeUtils singleton object to hold that 
> TimeZone, and use it every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala

2016-01-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124810#comment-15124810
 ] 

Apache Spark commented on SPARK-13100:
--

User 'wangyang1992' has created a pull request for this issue:
https://github.com/apache/spark/pull/10994

> improving the performance of stringToDate method in DateTimeUtils.scala
> ---
>
> Key: SPARK-13100
> URL: https://issues.apache.org/jira/browse/SPARK-13100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.0
>Reporter: Yang Wang
>  Labels: performance
> Attachments: screenshot-1.png
>
>
> In the stringToDate method in DateTimeUtils.scala, in order to create a 
> Calendar instance we create a brand new TimeZone instance every time by 
> calling TimeZone.getTimeZone("GMT"). In jdk1.7, however,  this method is 
> synchronized, thus such an approach can cause significant performance loss. 
> Since the same time zone is used each time we call that method, I think we 
> should create a val in the DateTimeUtils singleton object to hold that 
> TimeZone, and use it every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala

2016-01-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13100:


Assignee: Apache Spark

> improving the performance of stringToDate method in DateTimeUtils.scala
> ---
>
> Key: SPARK-13100
> URL: https://issues.apache.org/jira/browse/SPARK-13100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.0
>Reporter: Yang Wang
>Assignee: Apache Spark
>  Labels: performance
> Attachments: screenshot-1.png
>
>
> In the stringToDate method in DateTimeUtils.scala, in order to create a 
> Calendar instance we create a brand new TimeZone instance every time by 
> calling TimeZone.getTimeZone("GMT"). In jdk1.7, however,  this method is 
> synchronized, thus such an approach can cause significant performance loss. 
> Since the same time zone is used each time we call that method, I think we 
> should create a val in the DateTimeUtils singleton object to hold that 
> TimeZone, and use it every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala

2016-01-30 Thread Yang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated SPARK-13100:
--
Attachment: screenshot-1.png

> improving the performance of stringToDate method in DateTimeUtils.scala
> ---
>
> Key: SPARK-13100
> URL: https://issues.apache.org/jira/browse/SPARK-13100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.0
>Reporter: Yang Wang
>  Labels: performance
> Attachments: screenshot-1.png
>
>
> In the stringToDate method in DateTimeUtils.scala, in order to create a 
> Calendar instance we create a brand new TimeZone instance every time by 
> calling TimeZone.getTimeZone("GMT"). In jdk1.7, however,  this method is 
> synchronized, thus such an approach can cause significant performance loss. 
> Since the same time zone is used each time we call that method, I think we 
> should create a val in the DateTimeUtils singleton object to hold that 
> TimeZone, and use it every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala

2016-01-30 Thread Yang Wang (JIRA)
Yang Wang created SPARK-13100:
-

 Summary: improving the performance of stringToDate method in 
DateTimeUtils.scala
 Key: SPARK-13100
 URL: https://issues.apache.org/jira/browse/SPARK-13100
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.6.0, 1.5.2
Reporter: Yang Wang


In the stringToDate method in DateTimeUtils.scala, in order to create a 
Calendar instance we create a brand new TimeZone instance every time by calling 
TimeZone.getTimeZone("GMT"). In jdk1.7, however,  this method is synchronized, 
thus such an approach can cause significant performance loss. Since the same 
time zone is used each time we call that method, I think we should create a val 
in the DateTimeUtils singleton object to hold that TimeZone, and use it every 
time.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6363) Switch to Scala 2.11 for default build

2016-01-30 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6363:
---
Summary: Switch to Scala 2.11 for default build  (was: make scala 2.11 
default language)

> Switch to Scala 2.11 for default build
> --
>
> Key: SPARK-6363
> URL: https://issues.apache.org/jira/browse/SPARK-6363
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: antonkulaga
>Assignee: Josh Rosen
>Priority: Minor
>  Labels: releasenotes
> Fix For: 2.0.0
>
>
> Mostly all libraries already moved to 2.11 and many are starting to drop 2.10 
> support. So, it will be better if Spark binaries would be build with Scala 
> 2.11 by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6363) make scala 2.11 default language

2016-01-30 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6363:
---
Labels: releasenotes  (was: scala)

> make scala 2.11 default language
> 
>
> Key: SPARK-6363
> URL: https://issues.apache.org/jira/browse/SPARK-6363
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: antonkulaga
>Assignee: Josh Rosen
>Priority: Minor
>  Labels: releasenotes
> Fix For: 2.0.0
>
>
> Mostly all libraries already moved to 2.11 and many are starting to drop 2.10 
> support. So, it will be better if Spark binaries would be build with Scala 
> 2.11 by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6363) make scala 2.11 default language

2016-01-30 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-6363.

   Resolution: Fixed
Fix Version/s: 2.0.0

> make scala 2.11 default language
> 
>
> Key: SPARK-6363
> URL: https://issues.apache.org/jira/browse/SPARK-6363
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: antonkulaga
>Assignee: Josh Rosen
>Priority: Minor
>  Labels: releasenotes
> Fix For: 2.0.0
>
>
> Mostly all libraries already moved to 2.11 and many are starting to drop 2.10 
> support. So, it will be better if Spark binaries would be build with Scala 
> 2.11 by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org