[jira] [Commented] (SPARK-12715) Improve test coverage
[ https://issues.apache.org/jira/browse/SPARK-12715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125247#comment-15125247 ] Reynold Xin commented on SPARK-12715: - [~davies] are there specific things you have in mind? cc [~hvanhovell] > Improve test coverage > - > > Key: SPARK-12715 > URL: https://issues.apache.org/jira/browse/SPARK-12715 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Davies Liu > > We could bring in all test parser test cases into Spark, to make sure we will > not break compatibility with Hive (we could do more and skip some of them > that does not make sense). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12772) Better error message for parsing failure?
[ https://issues.apache.org/jira/browse/SPARK-12772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125246#comment-15125246 ] Reynold Xin commented on SPARK-12772: - cc [~hvanhovell] / [~viirya] any idea about this one? > Better error message for parsing failure? > - > > Key: SPARK-12772 > URL: https://issues.apache.org/jira/browse/SPARK-12772 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > {code} > scala> sql("select case if(true, 'one', 'two')").explain(true) > org.apache.spark.sql.AnalysisException: org.antlr.runtime.EarlyExitException > line 1:34 required (...)+ loop did not match anything at input '' in > case expression > ; line 1 pos 34 > at > org.apache.spark.sql.catalyst.parser.ParseErrorReporter.throwError(ParseDriver.scala:140) > at > org.apache.spark.sql.catalyst.parser.ParseErrorReporter.throwError(ParseDriver.scala:129) > at > org.apache.spark.sql.catalyst.parser.ParseDriver$.parse(ParseDriver.scala:77) > at > org.apache.spark.sql.catalyst.CatalystQl.createPlan(CatalystQl.scala:53) > at > org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41) > at > org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40) > {code} > Is there a way to say something better other than "required (...)+ loop did > not match anything at input"? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12689) Migrate DDL parsing to the newly absorbed parser
[ https://issues.apache.org/jira/browse/SPARK-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-12689. - Resolution: Fixed Assignee: Liang-Chi Hsieh Fix Version/s: 2.0.0 > Migrate DDL parsing to the newly absorbed parser > > > Key: SPARK-12689 > URL: https://issues.apache.org/jira/browse/SPARK-12689 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Liang-Chi Hsieh > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13070) Points out which physical file is the trouble maker when Parquet schema merging fails
[ https://issues.apache.org/jira/browse/SPARK-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13070. - Resolution: Fixed Fix Version/s: 2.0.0 > Points out which physical file is the trouble maker when Parquet schema > merging fails > - > > Key: SPARK-13070 > URL: https://issues.apache.org/jira/browse/SPARK-13070 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Cheng Lian >Priority: Minor > Fix For: 2.0.0 > > > As a user, I'd like to know which physical file is the trouble maker when > Parquet schema merging fails. Currently, we only have an error message like > this: > {quote} > Failed to merge incompatible data types LongType and IntegerType > {quote} > Would be nice to add the file path and the actual schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12951) Support spilling in generate aggregate
[ https://issues.apache.org/jira/browse/SPARK-12951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125245#comment-15125245 ] Apache Spark commented on SPARK-12951: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/10998 > Support spilling in generate aggregate > -- > > Key: SPARK-12951 > URL: https://issues.apache.org/jira/browse/SPARK-12951 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7009) Build assembly JAR via ant to avoid zip64 problems
[ https://issues.apache.org/jira/browse/SPARK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125205#comment-15125205 ] Zhan Zhang commented on SPARK-7009: --- Yes. This one is obsoleted. > Build assembly JAR via ant to avoid zip64 problems > -- > > Key: SPARK-7009 > URL: https://issues.apache.org/jira/browse/SPARK-7009 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.3.0 > Environment: Java 7+ >Reporter: Steve Loughran > Attachments: check_spark_python.sh > > Original Estimate: 2h > Remaining Estimate: 2h > > SPARK-1911 shows the problem that JDK7+ is using zip64 to build large JARs; a > format incompatible with Java and pyspark. > Provided the total number of .class files+resources is <64K, ant can be used > to make the final JAR instead, perhaps by unzipping the maven-generated JAR > then rezipping it with zip64=never, before publishing the artifact via maven. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13105) Spark 1.6 and earlier should reject NATURAL JOIN queries instead of returning wrong answers
[ https://issues.apache.org/jira/browse/SPARK-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125199#comment-15125199 ] Apache Spark commented on SPARK-13105: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/10997 > Spark 1.6 and earlier should reject NATURAL JOIN queries instead of returning > wrong answers > --- > > Key: SPARK-13105 > URL: https://issues.apache.org/jira/browse/SPARK-13105 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.2, 1.6.0 >Reporter: Josh Rosen >Assignee: Josh Rosen > > In Spark 1.6 and earlier, Spark SQL does not support {{NATURAL JOIN}} > queries. However, its SQL parser does not consider {{NATURAL}} to be a > reserved word, which causes natural joins to be parsed as regular joins where > the left table has been aliased. For instance, > {code} > SELECT * FROM foo NATURAL JOIN bar > {code} > gets interpreted as "foo JOIN bar" where "foo" is aliased to "natural". > Rather than doing this, which leads to confusing / wrong results for users > who expect NATURAL JOIN behavior, Spark should immediately reject these > queries at analysis time and should provide an informative error message. > We're going to add natural join support in Spark 2.0, but for earlier > versions we should add a bugfix to throw errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13105) Spark 1.6 and earlier should reject NATURAL JOIN queries instead of returning wrong answers
Josh Rosen created SPARK-13105: -- Summary: Spark 1.6 and earlier should reject NATURAL JOIN queries instead of returning wrong answers Key: SPARK-13105 URL: https://issues.apache.org/jira/browse/SPARK-13105 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0, 1.5.2, 1.4.1 Reporter: Josh Rosen Assignee: Josh Rosen In Spark 1.6 and earlier, Spark SQL does not support {{NATURAL JOIN}} queries. However, its SQL parser does not consider {{NATURAL}} to be a reserved word, which causes natural joins to be parsed as regular joins where the left table has been aliased. For instance, {code} SELECT * FROM foo NATURAL JOIN bar {code} gets interpreted as "foo JOIN bar" where "foo" is aliased to "natural". Rather than doing this, which leads to confusing / wrong results for users who expect NATURAL JOIN behavior, Spark should immediately reject these queries at analysis time and should provide an informative error message. We're going to add natural join support in Spark 2.0, but for earlier versions we should add a bugfix to throw errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7009) Build assembly JAR via ant to avoid zip64 problems
[ https://issues.apache.org/jira/browse/SPARK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125153#comment-15125153 ] Josh Rosen commented on SPARK-7009: --- I believe that this will be obsoleted by SPARK-11157, no? > Build assembly JAR via ant to avoid zip64 problems > -- > > Key: SPARK-7009 > URL: https://issues.apache.org/jira/browse/SPARK-7009 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.3.0 > Environment: Java 7+ >Reporter: Steve Loughran > Attachments: check_spark_python.sh > > Original Estimate: 2h > Remaining Estimate: 2h > > SPARK-1911 shows the problem that JDK7+ is using zip64 to build large JARs; a > format incompatible with Java and pyspark. > Provided the total number of .class files+resources is <64K, ant can be used > to make the final JAR instead, perhaps by unzipping the maven-generated JAR > then rezipping it with zip64=never, before publishing the artifact via maven. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark
[ https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125152#comment-15125152 ] Josh Rosen commented on SPARK-6305: --- Hey [~srowen], what do you think the next steps are in evaluating how to handle log4j 2.x in Spark? Just pinging this now since I'm trying to resolve major build/dep changes earlier in the 2.0.0 cycle. > Add support for log4j 2.x to Spark > -- > > Key: SPARK-6305 > URL: https://issues.apache.org/jira/browse/SPARK-6305 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Tal Sliwowicz >Priority: Minor > > log4j 2 requires replacing the slf4j binding and adding the log4j jars in the > classpath. Since there are shaded jars, it must be done during the build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12154) Upgrade to Jersey 2
[ https://issues.apache.org/jira/browse/SPARK-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125151#comment-15125151 ] Josh Rosen commented on SPARK-12154: Since we're in the middle of the 2.0.0 development cycle right now, it seems like it would be a good time to revisit upgrading to Jersey 2. [~mcheah], would you or someone else be interested in helping to scope out this task to figure out what it's going to require? > Upgrade to Jersey 2 > --- > > Key: SPARK-12154 > URL: https://issues.apache.org/jira/browse/SPARK-12154 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 1.5.2 >Reporter: Matt Cheah > > Fairly self-explanatory, Jersey 1 is a bit old and could use an upgrade. > Library conflicts for Jersey are difficult to workaround - see discussion on > SPARK-11081. It's easier to upgrade Jersey entirely, but we should target > Spark 2.0 since this may be a break for users who were using Jersey 1 in > their Spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11416) Upgrade kryo package to version 3.0
[ https://issues.apache.org/jira/browse/SPARK-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125149#comment-15125149 ] Josh Rosen commented on SPARK-11416: Since we're in the middle of 2.0.0 development, now seems like a good time to revisit upgrading to Kryo 3. For those of you more familiar with this issue, could you help me to break down this story into some smaller subtasks so we can make progress? Does this require coordination with any third parties? What are the changes we need in Spark? > Upgrade kryo package to version 3.0 > --- > > Key: SPARK-11416 > URL: https://issues.apache.org/jira/browse/SPARK-11416 > Project: Spark > Issue Type: Wish > Components: Build >Affects Versions: 1.5.1 >Reporter: Hitoshi Ozawa > > Would like to have Apache Spark upgrade kryo package from 2.x (current) to > 3.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7019) Build docs on doc changes
[ https://issues.apache.org/jira/browse/SPARK-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125150#comment-15125150 ] Josh Rosen commented on SPARK-7019: --- This would be great to do but is somewhat blocked at the moment by the fact that the doc building dependencies (some Ruby stuff) aren't installed on all Jenkins workers. > Build docs on doc changes > - > > Key: SPARK-7019 > URL: https://issues.apache.org/jira/browse/SPARK-7019 > Project: Spark > Issue Type: New Feature > Components: Build >Reporter: Brennon York > > Currently when a pull request changes the {{docs/}} directory, the docs > aren't actually built. When a PR is submitted the {{git}} history should be > checked to see if any doc changes were made and, if so, properly build the > docs and report any issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12822) Change default build to Hadoop 2.7
[ https://issues.apache.org/jira/browse/SPARK-12822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-12822. Resolution: Won't Fix It sounds like this is a clear "Won't Fix" as long as we continue to support Hadoop 2.2, so I'm going to close this for now. We can re-open if this decision changes. > Change default build to Hadoop 2.7 > -- > > Key: SPARK-12822 > URL: https://issues.apache.org/jira/browse/SPARK-12822 > Project: Spark > Issue Type: Sub-task > Components: Build >Reporter: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125147#comment-15125147 ] Josh Rosen commented on SPARK-11157: I'd love to start making progress towards removing the assemblies. Before we do so, though, I think there are a few subtasks / obstacles that we need to clear first: - First, I think we should just completely remove the assembly rather than giving both assembly and non-assembly options. Every additional option that we provide / support adds lots of maintenance burden and it would be nice to standardize on a single supported distribution technique. - Prior to removing the assemblies, it would be great if we could reconfigure our tests to not depend on the full assembly JAR in order to run. We already have {{SPARK_PREPEND_CLASSPATH}} today, so this might be as simple as making that behavior the default and reconfiguring our test scripts to skip the assembly step. - Building up a {{-classpath}} argument that lists hundreds of JARs is going to be a debugging nightmare (lots of tools truncate process arguments past some limit, etc.), so it would be good to investigate other techniques that we can use to pass the classpath to {{java}} without bloating the CLI (maybe using an environment variable or some file or something?). - This is going to require changes to Launcher, shell scripts, and a few other places; it would be good to scope out these changes to estimate how much work it's going to be. [~vanzin], are there any other obvious subtasks that I'm not thinking of? I'd like to try to see whether we can break down this big task and scope out some smaller pieces so we can make incremental progress and get this finished well in time for 2.0.0 so we have lots of time to test. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125143#comment-15125143 ] Josh Rosen commented on SPARK-7481: --- How does this proposal change if we just remove the assembly and ship a folder of JARs, as has been proposed elsewhere by [~vanzin]? Does that render this proposal moot? > Add Hadoop 2.6+ profile to pull in object store FS accessors > > > Key: SPARK-7481 > URL: https://issues.apache.org/jira/browse/SPARK-7481 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.3.1 >Reporter: Steve Loughran > > To keep the s3n classpath right, to add s3a, swift & azure, the dependencies > of spark in a 2.6+ profile need to add the relevant object store packages > (hadoop-aws, hadoop-openstack, hadoop-azure) > this adds more stuff to the client bundle, but will mean a single spark > package can talk to all of the stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6029) Unshaded "clearspring" classpath leakage + excluded fastutil interferes with apps using clearspring
[ https://issues.apache.org/jira/browse/SPARK-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-6029. --- Resolution: Incomplete Resolving as "incomplete", since it's not clear whether this issue is still valid. If it is, please comment and we can re-open and re-scope. Thanks! > Unshaded "clearspring" classpath leakage + excluded fastutil interferes with > apps using clearspring > > > Key: SPARK-6029 > URL: https://issues.apache.org/jira/browse/SPARK-6029 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.2.1 >Reporter: Jim Kleckner >Priority: Minor > > Spark includes the clearspring analytics package but intentionally excludes > the dependencies of the fastutil package. > Spark includes parquet-column which includes fastutil and relocates it under > parquet/ but creates a shaded jar file which is incomplete because it shades > out some of the fastutil classes, notably Long2LongOpenHashMap, which is > present in the fastutil jar file that parquet-column is referencing. > We are using more of the clearspring classes (e.g. QDigest) and those do > depend on missing fastutil classes like Long2LongOpenHashMap. > Even though I add them to our assembly jar file, the class loader finds the > spark assembly and we get runtime class loader errors when we try to use it. > The > [documentaion|http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment] > and possibly related issue > [SPARK-939|https://issues.apache.org/jira/browse/SPARK-939] suggest arguments > that I tried with spark-submit: > {code} > --conf spark.driver.userClassPathFirst=true \ > --conf spark.executor.userClassPathFirst=true > {code} > but we still get the class not found error. > Could this be a bug with {{userClassPathFirst=true}}? i.e. should it work? > In any case, would it be reasonable to not exclude the "fastutil" > dependencies? > See email discussion > [here|http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-Spark-excludes-quot-fastutil-quot-dependencies-we-need-tt21812.html] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5330) Core | Scala 2.11 | Transitive dependency on com.fasterxml.jackson.core :jackson-core:2.3.1 causes compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-5330. --- Resolution: Incomplete I'm going to resolve this as "incomplete" since I'm not sure whether it's still valid and there hasn't been any reply to Sean's questions. If this is still a valid issue, please comment and we can re-open. > Core | Scala 2.11 | Transitive dependency on com.fasterxml.jackson.core > :jackson-core:2.3.1 causes compatibility issues > --- > > Key: SPARK-5330 > URL: https://issues.apache.org/jira/browse/SPARK-5330 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.2.0 >Reporter: Aniket Bhatnagar >Priority: Minor > > Spark Transitive depends on com.fasterxml.jackson.core :jackson-core:2.3.1. > Users of jackson-module-scala had to to depend on the same version to avoid > any class compatibility issues. However, since scala 2.11, > jackson-module-scala is no longer published for version 2.3.1. Since the > version 2.3.1 is quiet old, perhaps we should investigate upgrading to latest > jackson-core. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-12261) pyspark crash for large dataset
[ https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reopened SPARK-12261: > pyspark crash for large dataset > --- > > Key: SPARK-12261 > URL: https://issues.apache.org/jira/browse/SPARK-12261 > Project: Spark > Issue Type: Bug >Affects Versions: 1.5.2 > Environment: windows >Reporter: zihao > > I tried to import a local text(over 100mb) file via textFile in pyspark, when > i ran data.take(), it failed and gave error messages including: > 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; > aborting job > Traceback (most recent call last): > File "E:/spark_python/test3.py", line 9, in > lines.take(5) > File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, > in take > res = self.context.runJob(self, takeUpToNumLeft, p) > File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line > 916, in runJob > port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, > partitions) > File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in > __call__ > answer, self.gateway_client, self.target_id, self.name) > File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line > 36, in deco > return f(*a, **kw) > File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in > get_return_value > format(target_id, ".", name), value) > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 > (TID 0, localhost): java.net.SocketException: Connection reset by peer: > socket write error > Then i ran the same code for a small text file, this time .take() worked fine. > How can i solve this problem? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala
[ https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13100. - Resolution: Fixed Assignee: Yang Wang Fix Version/s: 2.0.0 > improving the performance of stringToDate method in DateTimeUtils.scala > --- > > Key: SPARK-13100 > URL: https://issues.apache.org/jira/browse/SPARK-13100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2, 1.6.0 >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Minor > Fix For: 2.0.0 > > Attachments: screenshot-1.png > > > In the stringToDate method in DateTimeUtils.scala, in order to create a > Calendar instance we create a brand new TimeZone instance every time by > calling TimeZone.getTimeZone("GMT"). In jdk1.7, however, this method is > synchronized, thus such an approach can cause significant performance loss. > Since the same time zone is used each time we call that method, I think we > should create a val in the DateTimeUtils singleton object to hold that > TimeZone, and use it every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-13101: --- Target Version/s: 1.6.1 Priority: Blocker (was: Major) I'm temporarily marking this as a 1.6.1 blocker so that we make sure to investigate and triage before cutting an RC. /cc [~marmbrus] > Dataset complex types mapping to DataFrame (element nullability) mismatch > -- > > Key: SPARK-13101 > URL: https://issues.apache.org/jira/browse/SPARK-13101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: Deenar Toraskar >Priority: Blocker > > There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By > default a scala Seq[Double] is mapped by Spark as an ArrayType with nullable > element > |-- valuations: array (nullable = true) > ||-- element: double (containsNull = true) > This could be read back to as a Dataset in Spark 1.6.0 > val df = sqlContext.table("valuations").as[Valuation] > But with Spark 1.6.1 the same fails with > val df = sqlContext.table("valuations").as[Valuation] > org.apache.spark.sql.AnalysisException: cannot resolve 'cast(valuations as > array)' due to data type mismatch: cannot cast > ArrayType(DoubleType,true) to ArrayType(DoubleType,false); > Here's the classes I am using > case class Valuation(tradeId : String, > counterparty: String, > nettingAgreement: String, > wrongWay: Boolean, > valuations : Seq[Double], /* one per scenario */ > timeInterval: Int, > jobId: String) /* used for hdfs partitioning */ > val vals : Seq[Valuation] = Seq() > val valsDF = sqlContext.sparkContext.parallelize(vals).toDF > valsDF.write.partitionBy("jobId").mode(SaveMode.Overwrite).saveAsTable("valuations") > even the following gives the same result > val valsDF = vals.toDS.toDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-13101: --- Fix Version/s: (was: 1.6.1) > Dataset complex types mapping to DataFrame (element nullability) mismatch > -- > > Key: SPARK-13101 > URL: https://issues.apache.org/jira/browse/SPARK-13101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: Deenar Toraskar >Priority: Blocker > > There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By > default a scala Seq[Double] is mapped by Spark as an ArrayType with nullable > element > |-- valuations: array (nullable = true) > ||-- element: double (containsNull = true) > This could be read back to as a Dataset in Spark 1.6.0 > val df = sqlContext.table("valuations").as[Valuation] > But with Spark 1.6.1 the same fails with > val df = sqlContext.table("valuations").as[Valuation] > org.apache.spark.sql.AnalysisException: cannot resolve 'cast(valuations as > array)' due to data type mismatch: cannot cast > ArrayType(DoubleType,true) to ArrayType(DoubleType,false); > Here's the classes I am using > case class Valuation(tradeId : String, > counterparty: String, > nettingAgreement: String, > wrongWay: Boolean, > valuations : Seq[Double], /* one per scenario */ > timeInterval: Int, > jobId: String) /* used for hdfs partitioning */ > val vals : Seq[Valuation] = Seq() > val valsDF = sqlContext.sparkContext.parallelize(vals).toDF > valsDF.write.partitionBy("jobId").mode(SaveMode.Overwrite).saveAsTable("valuations") > even the following gives the same result > val valsDF = vals.toDS.toDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13104) Spark Metrics currently does not return executors hostname
[ https://issues.apache.org/jira/browse/SPARK-13104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125083#comment-15125083 ] Josh Rosen commented on SPARK-13104: Which Spark version? And which metrics? TaskMetrics? The Codahale metrics? > Spark Metrics currently does not return executors hostname > --- > > Key: SPARK-13104 > URL: https://issues.apache.org/jira/browse/SPARK-13104 > Project: Spark > Issue Type: Question >Reporter: Karthik >Priority: Critical > Labels: executor, executorId, graphite, hostname, metrics > > We been using Spark Metrics and porting the data to InfluxDB using the > Graphite sink that is available in Spark. From what I can see, it only > provides he executorId and not the executor hostname. With each spark job, > the executorID changes. Is there any way to find the hostname based on the > executorID? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13085) Add scalastyle command used in build testing
[ https://issues.apache.org/jira/browse/SPARK-13085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125067#comment-15125067 ] Marcelo Vanzin commented on SPARK-13085: No harm is keeping this open until we can upgrade scalastyle. > Add scalastyle command used in build testing > > > Key: SPARK-13085 > URL: https://issues.apache.org/jira/browse/SPARK-13085 > Project: Spark > Issue Type: Wish > Components: Build, Tests >Reporter: Charles Allen > > As an occasional or new contributor, it is easy to screw up scala style. But > looking at the output logs (for example > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50300/consoleFull > ) it is not obvious how to fix the scala style tests, even when reading the > scala style guide. > {code} > > Running Scala style checks > > Scalastyle checks failed at following occurrences: > [error] > /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala:22:0: > import.ordering.wrongOrderInGroup.message > [error] (core/compile:scalastyle) errors exist > [error] Total time: 9 s, completed Jan 28, 2016 2:11:00 PM > [error] running > /home/jenkins/workspace/SparkPullRequestBuilder/dev/lint-scala ; received > return code 1 > {code} > This ask is that the command used to check scalastyle is presented in the log > so a developer does not have to wait for the build process to check if a pull > request should pass scala style checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13104) Spark Metrics currently does not return executors hostname
Karthik created SPARK-13104: --- Summary: Spark Metrics currently does not return executors hostname Key: SPARK-13104 URL: https://issues.apache.org/jira/browse/SPARK-13104 Project: Spark Issue Type: Question Reporter: Karthik Priority: Critical We been using Spark Metrics and porting the data to InfluxDB using the Graphite sink that is available in Spark. From what I can see, it only provides he executorId and not the executor hostname. With each spark job, the executorID changes. Is there any way to find the hostname based on the executorID? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13103) HashTF dosn't count TF correctly
[ https://issues.apache.org/jira/browse/SPARK-13103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124937#comment-15124937 ] yuhao yang commented on SPARK-13103: Thanks for finding this. I'm not sure what's the historical reason, yet it's not common that HashingTF in Python was implemented independently from the Scala version. > HashTF dosn't count TF correctly > > > Key: SPARK-13103 > URL: https://issues.apache.org/jira/browse/SPARK-13103 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.6.0 > Environment: Ubuntu 14.04 > Python 3.4.3 >Reporter: Louis Liu > > I wrote a Python program to calculate frequencies of n-gram sequences with > HashTF. > But it generate a strange output. It found more "一一下嗎" than "一一下". > HashTF gets words' index with hash() > But hashes of some Chinese words are negative. > Ex: > >>> hash('一一下嗎') > -6433835193350070115 > >>> hash('一一下') > -5938108283593463272 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13103) HashTF dosn't count TF correctly
Louis Liu created SPARK-13103: - Summary: HashTF dosn't count TF correctly Key: SPARK-13103 URL: https://issues.apache.org/jira/browse/SPARK-13103 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.6.0 Environment: Ubuntu 14.04 Python 3.4.3 Reporter: Louis Liu I wrote a Python program to calculate frequencies of n-gram sequences with HashTF. But it generate a strange output. It found more "一一下嗎" than "一一下". HashTF gets words' index with hash() But hashes of some Chinese words are negative. Ex: >>> hash('一一下嗎') -6433835193350070115 >>> hash('一一下') -5938108283593463272 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13089) spark.ml Naive Bayes user guide
[ https://issues.apache.org/jira/browse/SPARK-13089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124889#comment-15124889 ] yuhao yang commented on SPARK-13089: I'll start on this. > spark.ml Naive Bayes user guide > --- > > Key: SPARK-13089 > URL: https://issues.apache.org/jira/browse/SPARK-13089 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Reporter: Joseph K. Bradley >Priority: Minor > > Add section in ml-classification.md for NaiveBayes DataFrame-based API, plus > example code (using include_example to clip code from examples/ folder files). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13099) ccjlbr
[ https://issues.apache.org/jira/browse/SPARK-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-13099. --- Resolution: Invalid Assuming this was a typo > ccjlbr > -- > > Key: SPARK-13099 > URL: https://issues.apache.org/jira/browse/SPARK-13099 > Project: Spark > Issue Type: Bug >Reporter: Michael Armbrust > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala
[ https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-13100: -- Labels: (was: performance) Priority: Minor (was: Major) > improving the performance of stringToDate method in DateTimeUtils.scala > --- > > Key: SPARK-13100 > URL: https://issues.apache.org/jira/browse/SPARK-13100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2, 1.6.0 >Reporter: Yang Wang >Priority: Minor > Attachments: screenshot-1.png > > > In the stringToDate method in DateTimeUtils.scala, in order to create a > Calendar instance we create a brand new TimeZone instance every time by > calling TimeZone.getTimeZone("GMT"). In jdk1.7, however, this method is > synchronized, thus such an approach can cause significant performance loss. > Since the same time zone is used each time we call that method, I think we > should create a val in the DateTimeUtils singleton object to hold that > TimeZone, and use it every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response
[ https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-13102: -- Attachment: dag info is blank.png Using IE11, I click the "DAG Visualization" in StagesPage, but get nothing. > Run query using ThriftServer, and open web using IE11, i click ”+detail" in > SQLPage, but not response > -- > > Key: SPARK-13102 > URL: https://issues.apache.org/jira/browse/SPARK-13102 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: KaiXinXIaoLei > Labels: UI > Fix For: 2.0.0 > > Attachments: dag info is blank.png, details in SQLPage.png > > > I run query using ThriftServer, and open web using IE11. Then i click > ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in > StagesPage, but get nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response
[ https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-13102: -- Attachment: details in SQLPage.png I click “+detais" in SQLpage, but it has no response. > Run query using ThriftServer, and open web using IE11, i click ”+detail" in > SQLPage, but not response > -- > > Key: SPARK-13102 > URL: https://issues.apache.org/jira/browse/SPARK-13102 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: KaiXinXIaoLei > Labels: UI > Fix For: 2.0.0 > > Attachments: details in SQLPage.png > > > I run query using ThriftServer, and open web using IE11. Then i click > ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in > StagesPage, but get nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response
[ https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-13102: -- Component/s: Web UI > Run query using ThriftServer, and open web using IE11, i click ”+detail" in > SQLPage, but not response > -- > > Key: SPARK-13102 > URL: https://issues.apache.org/jira/browse/SPARK-13102 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: KaiXinXIaoLei > Labels: UI > Fix For: 2.0.0 > > > I run query using ThriftServer, and open web using IE11. Then i click > ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in > StagesPage, but get nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response
[ https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-13102: -- Fix Version/s: 2.0.0 > Run query using ThriftServer, and open web using IE11, i click ”+detail" in > SQLPage, but not response > -- > > Key: SPARK-13102 > URL: https://issues.apache.org/jira/browse/SPARK-13102 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: KaiXinXIaoLei > Labels: UI > Fix For: 2.0.0 > > > I run query using ThriftServer, and open web using IE11. Then i click > ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in > StagesPage, but get nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response
[ https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-13102: -- Affects Version/s: 1.6.0 > Run query using ThriftServer, and open web using IE11, i click ”+detail" in > SQLPage, but not response > -- > > Key: SPARK-13102 > URL: https://issues.apache.org/jira/browse/SPARK-13102 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: KaiXinXIaoLei > Labels: UI > Fix For: 2.0.0 > > > I run query using ThriftServer, and open web using IE11. Then i click > ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in > StagesPage, but get nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response
[ https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-13102: -- Labels: UI (was: ) > Run query using ThriftServer, and open web using IE11, i click ”+detail" in > SQLPage, but not response > -- > > Key: SPARK-13102 > URL: https://issues.apache.org/jira/browse/SPARK-13102 > Project: Spark > Issue Type: Bug >Reporter: KaiXinXIaoLei > Labels: UI > > I run query using ThriftServer, and open web using IE11. Then i click > ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in > StagesPage, but get nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response
KaiXinXIaoLei created SPARK-13102: - Summary: Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response Key: SPARK-13102 URL: https://issues.apache.org/jira/browse/SPARK-13102 Project: Spark Issue Type: Bug Reporter: KaiXinXIaoLei I run query using ThriftServer, and open web using IE11. Then i click ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in StagesPage, but get nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch
Deenar Toraskar created SPARK-13101: --- Summary: Dataset complex types mapping to DataFrame (element nullability) mismatch Key: SPARK-13101 URL: https://issues.apache.org/jira/browse/SPARK-13101 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.1 Reporter: Deenar Toraskar Fix For: 1.6.1 There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By default a scala Seq[Double] is mapped by Spark as an ArrayType with nullable element |-- valuations: array (nullable = true) ||-- element: double (containsNull = true) This could be read back to as a Dataset in Spark 1.6.0 val df = sqlContext.table("valuations").as[Valuation] But with Spark 1.6.1 the same fails with val df = sqlContext.table("valuations").as[Valuation] org.apache.spark.sql.AnalysisException: cannot resolve 'cast(valuations as array)' due to data type mismatch: cannot cast ArrayType(DoubleType,true) to ArrayType(DoubleType,false); Here's the classes I am using case class Valuation(tradeId : String, counterparty: String, nettingAgreement: String, wrongWay: Boolean, valuations : Seq[Double], /* one per scenario */ timeInterval: Int, jobId: String) /* used for hdfs partitioning */ val vals : Seq[Valuation] = Seq() val valsDF = sqlContext.sparkContext.parallelize(vals).toDF valsDF.write.partitionBy("jobId").mode(SaveMode.Overwrite).saveAsTable("valuations") even the following gives the same result val valsDF = vals.toDS.toDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala
[ https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13100: Assignee: (was: Apache Spark) > improving the performance of stringToDate method in DateTimeUtils.scala > --- > > Key: SPARK-13100 > URL: https://issues.apache.org/jira/browse/SPARK-13100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2, 1.6.0 >Reporter: Yang Wang > Labels: performance > Attachments: screenshot-1.png > > > In the stringToDate method in DateTimeUtils.scala, in order to create a > Calendar instance we create a brand new TimeZone instance every time by > calling TimeZone.getTimeZone("GMT"). In jdk1.7, however, this method is > synchronized, thus such an approach can cause significant performance loss. > Since the same time zone is used each time we call that method, I think we > should create a val in the DateTimeUtils singleton object to hold that > TimeZone, and use it every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala
[ https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124810#comment-15124810 ] Apache Spark commented on SPARK-13100: -- User 'wangyang1992' has created a pull request for this issue: https://github.com/apache/spark/pull/10994 > improving the performance of stringToDate method in DateTimeUtils.scala > --- > > Key: SPARK-13100 > URL: https://issues.apache.org/jira/browse/SPARK-13100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2, 1.6.0 >Reporter: Yang Wang > Labels: performance > Attachments: screenshot-1.png > > > In the stringToDate method in DateTimeUtils.scala, in order to create a > Calendar instance we create a brand new TimeZone instance every time by > calling TimeZone.getTimeZone("GMT"). In jdk1.7, however, this method is > synchronized, thus such an approach can cause significant performance loss. > Since the same time zone is used each time we call that method, I think we > should create a val in the DateTimeUtils singleton object to hold that > TimeZone, and use it every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala
[ https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13100: Assignee: Apache Spark > improving the performance of stringToDate method in DateTimeUtils.scala > --- > > Key: SPARK-13100 > URL: https://issues.apache.org/jira/browse/SPARK-13100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2, 1.6.0 >Reporter: Yang Wang >Assignee: Apache Spark > Labels: performance > Attachments: screenshot-1.png > > > In the stringToDate method in DateTimeUtils.scala, in order to create a > Calendar instance we create a brand new TimeZone instance every time by > calling TimeZone.getTimeZone("GMT"). In jdk1.7, however, this method is > synchronized, thus such an approach can cause significant performance loss. > Since the same time zone is used each time we call that method, I think we > should create a val in the DateTimeUtils singleton object to hold that > TimeZone, and use it every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala
[ https://issues.apache.org/jira/browse/SPARK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang updated SPARK-13100: -- Attachment: screenshot-1.png > improving the performance of stringToDate method in DateTimeUtils.scala > --- > > Key: SPARK-13100 > URL: https://issues.apache.org/jira/browse/SPARK-13100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2, 1.6.0 >Reporter: Yang Wang > Labels: performance > Attachments: screenshot-1.png > > > In the stringToDate method in DateTimeUtils.scala, in order to create a > Calendar instance we create a brand new TimeZone instance every time by > calling TimeZone.getTimeZone("GMT"). In jdk1.7, however, this method is > synchronized, thus such an approach can cause significant performance loss. > Since the same time zone is used each time we call that method, I think we > should create a val in the DateTimeUtils singleton object to hold that > TimeZone, and use it every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13100) improving the performance of stringToDate method in DateTimeUtils.scala
Yang Wang created SPARK-13100: - Summary: improving the performance of stringToDate method in DateTimeUtils.scala Key: SPARK-13100 URL: https://issues.apache.org/jira/browse/SPARK-13100 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.6.0, 1.5.2 Reporter: Yang Wang In the stringToDate method in DateTimeUtils.scala, in order to create a Calendar instance we create a brand new TimeZone instance every time by calling TimeZone.getTimeZone("GMT"). In jdk1.7, however, this method is synchronized, thus such an approach can cause significant performance loss. Since the same time zone is used each time we call that method, I think we should create a val in the DateTimeUtils singleton object to hold that TimeZone, and use it every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6363) Switch to Scala 2.11 for default build
[ https://issues.apache.org/jira/browse/SPARK-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-6363: --- Summary: Switch to Scala 2.11 for default build (was: make scala 2.11 default language) > Switch to Scala 2.11 for default build > -- > > Key: SPARK-6363 > URL: https://issues.apache.org/jira/browse/SPARK-6363 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: antonkulaga >Assignee: Josh Rosen >Priority: Minor > Labels: releasenotes > Fix For: 2.0.0 > > > Mostly all libraries already moved to 2.11 and many are starting to drop 2.10 > support. So, it will be better if Spark binaries would be build with Scala > 2.11 by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6363) make scala 2.11 default language
[ https://issues.apache.org/jira/browse/SPARK-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-6363: --- Labels: releasenotes (was: scala) > make scala 2.11 default language > > > Key: SPARK-6363 > URL: https://issues.apache.org/jira/browse/SPARK-6363 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: antonkulaga >Assignee: Josh Rosen >Priority: Minor > Labels: releasenotes > Fix For: 2.0.0 > > > Mostly all libraries already moved to 2.11 and many are starting to drop 2.10 > support. So, it will be better if Spark binaries would be build with Scala > 2.11 by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6363) make scala 2.11 default language
[ https://issues.apache.org/jira/browse/SPARK-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-6363. Resolution: Fixed Fix Version/s: 2.0.0 > make scala 2.11 default language > > > Key: SPARK-6363 > URL: https://issues.apache.org/jira/browse/SPARK-6363 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: antonkulaga >Assignee: Josh Rosen >Priority: Minor > Labels: releasenotes > Fix For: 2.0.0 > > > Mostly all libraries already moved to 2.11 and many are starting to drop 2.10 > support. So, it will be better if Spark binaries would be build with Scala > 2.11 by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org