[jira] [Resolved] (SPARK-7228) SparkR public API for 1.4 release
[ https://issues.apache.org/jira/browse/SPARK-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-7228. -- Resolution: Fixed Resolving as all the sub-tasks have been now resolved > SparkR public API for 1.4 release > - > > Key: SPARK-7228 > URL: https://issues.apache.org/jira/browse/SPARK-7228 > Project: Spark > Issue Type: Umbrella > Components: SparkR >Affects Versions: 1.4.0 >Reporter: Shivaram Venkataraman >Assignee: Shivaram Venkataraman >Priority: Blocker > > This in an umbrella ticket to track the public APIs and documentation to be > released as a part of SparkR in the 1.4 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536227#comment-14536227 ] Apache Spark commented on SPARK-3928: - User 'tkyaw' has created a pull request for this issue: https://github.com/apache/spark/pull/6025 > Support wildcard matches on Parquet files > - > > Key: SPARK-3928 > URL: https://issues.apache.org/jira/browse/SPARK-3928 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Reporter: Nicholas Chammas >Assignee: Cheng Lian >Priority: Minor > Fix For: 1.3.0 > > > {{SparkContext.textFile()}} supports patterns like {{part-*}} and > {{2014-\?\?-\?\?}}. > It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7468) DAG visualization: certain action operators should not be scopes
[ https://issues.apache.org/jira/browse/SPARK-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-7468. Resolution: Won't Fix It's hard to apply a general rule that decides whether an operation should be marked "withScope". Even an action may be marked as such in case we make it use an operation internally in the future (e.g. if we change `take` implementation to use a random `map` or `filter` somewhere). Closing as won't fix because in the worst case we show a higher level construct, which is familiar to the user, rather than a low level operation used internally (e.g. otherwise `map` shows up on the UI even though the user didn't explicitly call `map`). > DAG visualization: certain action operators should not be scopes > > > Key: SPARK-7468 > URL: https://issues.apache.org/jira/browse/SPARK-7468 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 1.4.0 >Reporter: Andrew Or >Assignee: Andrew Or > > What does it mean to have a "take" scope and an RDD in it? This is somewhat > confusing. Low hanging fruit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7498) Params.setDefault should not use varargs annotation
[ https://issues.apache.org/jira/browse/SPARK-7498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-7498. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 6021 [https://github.com/apache/spark/pull/6021] > Params.setDefault should not use varargs annotation > --- > > Key: SPARK-7498 > URL: https://issues.apache.org/jira/browse/SPARK-7498 > Project: Spark > Issue Type: Bug > Components: Java API, ML >Affects Versions: 1.4.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > Fix For: 1.4.0 > > > In [SPARK-7429] and PR [https://github.com/apache/spark/pull/5960], I added > the varargs annotation to Params.setDefault which takes a variable number of > ParamPairs. It worked locally and on Jenkins for me. > However, @mengxr reported issues compiling on his machine. So I'm reverting > the change introduced in [https://github.com/apache/spark/pull/5960] by > removing varargs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7262) Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML package
[ https://issues.apache.org/jira/browse/SPARK-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-7262: - Assignee: DB Tsai > Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML > package > > > Key: SPARK-7262 > URL: https://issues.apache.org/jira/browse/SPARK-7262 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: DB Tsai >Assignee: DB Tsai > Fix For: 1.4.0 > > > 1) Handle scaling and addBias internally. > 2) L1/L2 elasticnet using OWLQN optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7262) Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML package
[ https://issues.apache.org/jira/browse/SPARK-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-7262. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 5967 [https://github.com/apache/spark/pull/5967] > Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML > package > > > Key: SPARK-7262 > URL: https://issues.apache.org/jira/browse/SPARK-7262 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: DB Tsai > Fix For: 1.4.0 > > > 1) Handle scaling and addBias internally. > 2) L1/L2 elasticnet using OWLQN optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7463) Dag visualization improvements
[ https://issues.apache.org/jira/browse/SPARK-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7463: - Target Version/s: 1.4.0, 1.5.0 (was: 1.5.0) > Dag visualization improvements > -- > > Key: SPARK-7463 > URL: https://issues.apache.org/jira/browse/SPARK-7463 > Project: Spark > Issue Type: Umbrella > Components: Web UI >Affects Versions: 1.4.0 >Reporter: Andrew Or >Assignee: Andrew Or > > This is the umbrella JIRA for improvements or bug fixes to the DAG > visualization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7502) DAG visualization: handle removed stages gracefully
Andrew Or created SPARK-7502: Summary: DAG visualization: handle removed stages gracefully Key: SPARK-7502 URL: https://issues.apache.org/jira/browse/SPARK-7502 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Andrew Or Right now we get a blank viz in the job page if this happens. Then the JS error message in the developer console looks something like "Warning: SVG view box cannot be 'Nan Nan Nan Nan'". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536114#comment-14536114 ] Yana Kadiyska commented on SPARK-3928: -- [~tkyaw] Your suggested workaround does work. One question though -- what are the implications of turning off "spark.sql.parquet.useDataSourceApi"? My particular concern is with predicate pushdowns into parquet -- am I going to lose these (it's hard to tell from the UI if pushdown is happening correctly). Also, can you clarify if you still plan to fix this for 1.4 or "New parquet implementation does not contain wild card support yet" means that we'd have to live with spark.sql.parquet.useDataSourceApi until further time? > Support wildcard matches on Parquet files > - > > Key: SPARK-3928 > URL: https://issues.apache.org/jira/browse/SPARK-3928 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Reporter: Nicholas Chammas >Assignee: Cheng Lian >Priority: Minor > Fix For: 1.3.0 > > > {{SparkContext.textFile()}} supports patterns like {{part-*}} and > {{2014-\?\?-\?\?}}. > It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7501) DAG visualization: show DStream operations for Streaming
[ https://issues.apache.org/jira/browse/SPARK-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7501: - Description: Similar to SQL in SPARK-7469, we should show higher level constructs that the user is more familiar with. (was: Similar to SQL, we should show higher level constructs that the user is more familiar with.) > DAG visualization: show DStream operations for Streaming > > > Key: SPARK-7501 > URL: https://issues.apache.org/jira/browse/SPARK-7501 > Project: Spark > Issue Type: Bug > Components: Streaming, Web UI >Affects Versions: 1.4.0 >Reporter: Andrew Or >Assignee: Andrew Or > > Similar to SQL in SPARK-7469, we should show higher level constructs that the > user is more familiar with. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7501) DAG visualization: show DStream operations for Streaming
Andrew Or created SPARK-7501: Summary: DAG visualization: show DStream operations for Streaming Key: SPARK-7501 URL: https://issues.apache.org/jira/browse/SPARK-7501 Project: Spark Issue Type: Bug Components: Streaming, Web UI Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Andrew Or Similar to SQL, we should show higher level constructs that the user is more familiar with. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7500) DAG visualization: cluster name bleeds beyond the cluster
[ https://issues.apache.org/jira/browse/SPARK-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7500: - Attachment: long names.png > DAG visualization: cluster name bleeds beyond the cluster > - > > Key: SPARK-7500 > URL: https://issues.apache.org/jira/browse/SPARK-7500 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.4.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Minor > Attachments: long names.png > > > This happens only for long names. See screenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7375) Avoid defensive copying in SQL exchange operator when sort-based shuffle buffers data in serialized form
[ https://issues.apache.org/jira/browse/SPARK-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-7375. - Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 5948 [https://github.com/apache/spark/pull/5948] > Avoid defensive copying in SQL exchange operator when sort-based shuffle > buffers data in serialized form > > > Key: SPARK-7375 > URL: https://issues.apache.org/jira/browse/SPARK-7375 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 1.4.0 > > > The original sort-based shuffle buffers shuffle input records in memory while > sorting them. This causes problems when mutable records are presented to the > shuffle, which happens in Spark SQL's Exchange operator. To work around this > issue, SPARK-2967 and SPARK-4479 added defensive copying of shuffle inputs in > the Exchange operator when sort-based shuffle is enabled. > I think that [~sandyr]'s recent patch for enabling serialization of records > in sort-based shuffle (SPARK-4550) and my proposed {{unsafe}}-based shuffle > path (SPARK-7081) may allow us to avoid this defensive copying in certain > cases (since our patches cause records to be serialized one-at-a-time and > remove the buffering of deserialized records). > As mentioned in SPARK-4479, a long-term fix for this issue might be to add > hooks for informing the shuffle about object (im)mutability in order to allow > the shuffle layer to decide whether to copy. In the meantime, though, I think > that we should just extend the checks added in SPARK-4479 to avoid copies > when these new serialized sort paths are used. > /cc [~rxin] [~marmbrus] [~yhuai] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7500) DAG visualization: cluster name bleeds beyond the cluster
Andrew Or created SPARK-7500: Summary: DAG visualization: cluster name bleeds beyond the cluster Key: SPARK-7500 URL: https://issues.apache.org/jira/browse/SPARK-7500 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Minor Attachments: long names.png This happens only for long names. See screenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7490) MapOutputTracker: close input streams to free native memory
[ https://issues.apache.org/jira/browse/SPARK-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-7490. Resolution: Fixed Target Version/s: 1.4.0 > MapOutputTracker: close input streams to free native memory > --- > > Key: SPARK-7490 > URL: https://issues.apache.org/jira/browse/SPARK-7490 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Evan Jones >Assignee: Evan Jones >Priority: Minor > Fix For: 1.4.0 > > > GZIPInputStream allocates native memory that is not freed until close() or > when the finalizer runs. It is best to close() these streams explicitly to > avoid native memory leaks > Pull request here: https://github.com/apache/spark/pull/5982 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7490) MapOutputTracker: close input streams to free native memory
[ https://issues.apache.org/jira/browse/SPARK-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-7490: -- > MapOutputTracker: close input streams to free native memory > --- > > Key: SPARK-7490 > URL: https://issues.apache.org/jira/browse/SPARK-7490 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Evan Jones >Assignee: Evan Jones >Priority: Minor > Fix For: 1.4.0 > > > GZIPInputStream allocates native memory that is not freed until close() or > when the finalizer runs. It is best to close() these streams explicitly to > avoid native memory leaks > Pull request here: https://github.com/apache/spark/pull/5982 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7490) MapOutputTracker: close input streams to free native memory
[ https://issues.apache.org/jira/browse/SPARK-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7490: - Fix Version/s: (was: 1.2.3) (was: 1.3.2) > MapOutputTracker: close input streams to free native memory > --- > > Key: SPARK-7490 > URL: https://issues.apache.org/jira/browse/SPARK-7490 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Evan Jones >Assignee: Evan Jones >Priority: Minor > Fix For: 1.4.0 > > > GZIPInputStream allocates native memory that is not freed until close() or > when the finalizer runs. It is best to close() these streams explicitly to > avoid native memory leaks > Pull request here: https://github.com/apache/spark/pull/5982 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7427) Make sharedParams match in Scala, Python
[ https://issues.apache.org/jira/browse/SPARK-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7427: --- Assignee: (was: Apache Spark) > Make sharedParams match in Scala, Python > > > Key: SPARK-7427 > URL: https://issues.apache.org/jira/browse/SPARK-7427 > Project: Spark > Issue Type: Documentation > Components: ML, PySpark >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > The documentation for shared Params differs a little between Scala, Python. > The Python docs should be modified to match the Scala ones. This will > require modifying the sharedParamsCodeGen files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7427) Make sharedParams match in Scala, Python
[ https://issues.apache.org/jira/browse/SPARK-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7427: --- Assignee: Apache Spark > Make sharedParams match in Scala, Python > > > Key: SPARK-7427 > URL: https://issues.apache.org/jira/browse/SPARK-7427 > Project: Spark > Issue Type: Documentation > Components: ML, PySpark >Reporter: Joseph K. Bradley >Assignee: Apache Spark >Priority: Trivial > Labels: starter > > The documentation for shared Params differs a little between Scala, Python. > The Python docs should be modified to match the Scala ones. This will > require modifying the sharedParamsCodeGen files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7427) Make sharedParams match in Scala, Python
[ https://issues.apache.org/jira/browse/SPARK-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536108#comment-14536108 ] Apache Spark commented on SPARK-7427: - User 'gweidner' has created a pull request for this issue: https://github.com/apache/spark/pull/6023 > Make sharedParams match in Scala, Python > > > Key: SPARK-7427 > URL: https://issues.apache.org/jira/browse/SPARK-7427 > Project: Spark > Issue Type: Documentation > Components: ML, PySpark >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > The documentation for shared Params differs a little between Scala, Python. > The Python docs should be modified to match the Scala ones. This will > require modifying the sharedParamsCodeGen files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7231) Make SparkR DataFrame API more dplyr friendly
[ https://issues.apache.org/jira/browse/SPARK-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-7231: - Fix Version/s: 1.4.0 > Make SparkR DataFrame API more dplyr friendly > - > > Key: SPARK-7231 > URL: https://issues.apache.org/jira/browse/SPARK-7231 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.4.0 >Reporter: Shivaram Venkataraman >Assignee: Shivaram Venkataraman >Priority: Critical > Fix For: 1.4.0 > > > This ticket tracks auditing the SparkR dataframe API and ensuring that the > API is friendly to existing R users. > Mainly we wish to make sure the DataFrame API we expose has functions similar > to those which exist on native R data frames and in popular packages like > `dplyr`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7231) Make SparkR DataFrame API more dplyr friendly
[ https://issues.apache.org/jira/browse/SPARK-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536091#comment-14536091 ] Shivaram Venkataraman commented on SPARK-7231: -- Fixed by https://github.com/apache/spark/pull/6005 > Make SparkR DataFrame API more dplyr friendly > - > > Key: SPARK-7231 > URL: https://issues.apache.org/jira/browse/SPARK-7231 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.4.0 >Reporter: Shivaram Venkataraman >Assignee: Shivaram Venkataraman >Priority: Critical > > This ticket tracks auditing the SparkR dataframe API and ensuring that the > API is friendly to existing R users. > Mainly we wish to make sure the DataFrame API we expose has functions similar > to those which exist on native R data frames and in popular packages like > `dplyr`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7231) Make SparkR DataFrame API more dplyr friendly
[ https://issues.apache.org/jira/browse/SPARK-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-7231. -- Resolution: Fixed > Make SparkR DataFrame API more dplyr friendly > - > > Key: SPARK-7231 > URL: https://issues.apache.org/jira/browse/SPARK-7231 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.4.0 >Reporter: Shivaram Venkataraman >Assignee: Shivaram Venkataraman >Priority: Critical > > This ticket tracks auditing the SparkR dataframe API and ensuring that the > API is friendly to existing R users. > Mainly we wish to make sure the DataFrame API we expose has functions similar > to those which exist on native R data frames and in popular packages like > `dplyr`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7497) test_count_by_value_and_window is flaky
[ https://issues.apache.org/jira/browse/SPARK-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536060#comment-14536060 ] Shivaram Venkataraman commented on SPARK-7497: -- I've seen this a couple of times as well very recently > test_count_by_value_and_window is flaky > --- > > Key: SPARK-7497 > URL: https://issues.apache.org/jira/browse/SPARK-7497 > Project: Spark > Issue Type: Bug > Components: PySpark, Streaming >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Assignee: Tathagata Das > Labels: flaky-test > > Saw this test failure in > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32268/console > {code} > == > FAIL: test_count_by_value_and_window (__main__.WindowFunctionTests) > -- > Traceback (most recent call last): > File "pyspark/streaming/tests.py", line 418, in > test_count_by_value_and_window > self._test_func(input, func, expected) > File "pyspark/streaming/tests.py", line 133, in _test_func > self.assertEqual(expected, result) > AssertionError: Lists differ: [[1], [2], [3], [4], [5], [6], [6], [6], [6], > [6]] != [[1], [2], [3], [4], [5], [6], [6], [6]] > First list contains 2 additional elements. > First extra element 8: > [6] > - [[1], [2], [3], [4], [5], [6], [6], [6], [6], [6]] > ? -- > + [[1], [2], [3], [4], [5], [6], [6], [6]] > -- > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7473) Use reservoir sample in RandomForest when choosing features per node
[ https://issues.apache.org/jira/browse/SPARK-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7473: --- Assignee: (was: Apache Spark) > Use reservoir sample in RandomForest when choosing features per node > > > Key: SPARK-7473 > URL: https://issues.apache.org/jira/browse/SPARK-7473 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley >Priority: Trivial > > See sampling in selectNodesToSplit method -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7473) Use reservoir sample in RandomForest when choosing features per node
[ https://issues.apache.org/jira/browse/SPARK-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536058#comment-14536058 ] Apache Spark commented on SPARK-7473: - User 'AiHe' has created a pull request for this issue: https://github.com/apache/spark/pull/5988 > Use reservoir sample in RandomForest when choosing features per node > > > Key: SPARK-7473 > URL: https://issues.apache.org/jira/browse/SPARK-7473 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley >Priority: Trivial > > See sampling in selectNodesToSplit method -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7473) Use reservoir sample in RandomForest when choosing features per node
[ https://issues.apache.org/jira/browse/SPARK-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7473: --- Assignee: Apache Spark > Use reservoir sample in RandomForest when choosing features per node > > > Key: SPARK-7473 > URL: https://issues.apache.org/jira/browse/SPARK-7473 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley >Assignee: Apache Spark >Priority: Trivial > > See sampling in selectNodesToSplit method -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7485) Remove python artifacts from the assembly jar
[ https://issues.apache.org/jira/browse/SPARK-7485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536026#comment-14536026 ] Apache Spark commented on SPARK-7485: - User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/6022 > Remove python artifacts from the assembly jar > - > > Key: SPARK-7485 > URL: https://issues.apache.org/jira/browse/SPARK-7485 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.4.0 >Reporter: Thomas Graves > > We change it so that we distributed the python files via a zip file in > SPARK-6869. With that we should remove the python files from the assembly > jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7485) Remove python artifacts from the assembly jar
[ https://issues.apache.org/jira/browse/SPARK-7485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7485: --- Assignee: (was: Apache Spark) > Remove python artifacts from the assembly jar > - > > Key: SPARK-7485 > URL: https://issues.apache.org/jira/browse/SPARK-7485 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.4.0 >Reporter: Thomas Graves > > We change it so that we distributed the python files via a zip file in > SPARK-6869. With that we should remove the python files from the assembly > jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7485) Remove python artifacts from the assembly jar
[ https://issues.apache.org/jira/browse/SPARK-7485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7485: --- Assignee: Apache Spark > Remove python artifacts from the assembly jar > - > > Key: SPARK-7485 > URL: https://issues.apache.org/jira/browse/SPARK-7485 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.4.0 >Reporter: Thomas Graves >Assignee: Apache Spark > > We change it so that we distributed the python files via a zip file in > SPARK-6869. With that we should remove the python files from the assembly > jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7488) Python API for ml.recommendation
[ https://issues.apache.org/jira/browse/SPARK-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-7488. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 6015 [https://github.com/apache/spark/pull/6015] > Python API for ml.recommendation > > > Key: SPARK-7488 > URL: https://issues.apache.org/jira/browse/SPARK-7488 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Reporter: Burak Yavuz > Fix For: 1.4.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7498) Params.setDefault should not use varargs annotation
[ https://issues.apache.org/jira/browse/SPARK-7498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7498: --- Assignee: Apache Spark (was: Joseph K. Bradley) > Params.setDefault should not use varargs annotation > --- > > Key: SPARK-7498 > URL: https://issues.apache.org/jira/browse/SPARK-7498 > Project: Spark > Issue Type: Bug > Components: Java API, ML >Affects Versions: 1.4.0 >Reporter: Joseph K. Bradley >Assignee: Apache Spark > > In [SPARK-7429] and PR [https://github.com/apache/spark/pull/5960], I added > the varargs annotation to Params.setDefault which takes a variable number of > ParamPairs. It worked locally and on Jenkins for me. > However, @mengxr reported issues compiling on his machine. So I'm reverting > the change introduced in [https://github.com/apache/spark/pull/5960] by > removing varargs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7498) Params.setDefault should not use varargs annotation
[ https://issues.apache.org/jira/browse/SPARK-7498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7498: --- Assignee: Joseph K. Bradley (was: Apache Spark) > Params.setDefault should not use varargs annotation > --- > > Key: SPARK-7498 > URL: https://issues.apache.org/jira/browse/SPARK-7498 > Project: Spark > Issue Type: Bug > Components: Java API, ML >Affects Versions: 1.4.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > > In [SPARK-7429] and PR [https://github.com/apache/spark/pull/5960], I added > the varargs annotation to Params.setDefault which takes a variable number of > ParamPairs. It worked locally and on Jenkins for me. > However, @mengxr reported issues compiling on his machine. So I'm reverting > the change introduced in [https://github.com/apache/spark/pull/5960] by > removing varargs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7498) Params.setDefault should not use varargs annotation
[ https://issues.apache.org/jira/browse/SPARK-7498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536019#comment-14536019 ] Apache Spark commented on SPARK-7498: - User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/6021 > Params.setDefault should not use varargs annotation > --- > > Key: SPARK-7498 > URL: https://issues.apache.org/jira/browse/SPARK-7498 > Project: Spark > Issue Type: Bug > Components: Java API, ML >Affects Versions: 1.4.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > > In [SPARK-7429] and PR [https://github.com/apache/spark/pull/5960], I added > the varargs annotation to Params.setDefault which takes a variable number of > ParamPairs. It worked locally and on Jenkins for me. > However, @mengxr reported issues compiling on his machine. So I'm reverting > the change introduced in [https://github.com/apache/spark/pull/5960] by > removing varargs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7488) Python API for ml.recommendation
[ https://issues.apache.org/jira/browse/SPARK-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-7488: - Assignee: Burak Yavuz > Python API for ml.recommendation > > > Key: SPARK-7488 > URL: https://issues.apache.org/jira/browse/SPARK-7488 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Reporter: Burak Yavuz >Assignee: Burak Yavuz > Fix For: 1.4.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7499) Investigate how to specify columns in SparkR without $ or strings
Shivaram Venkataraman created SPARK-7499: Summary: Investigate how to specify columns in SparkR without $ or strings Key: SPARK-7499 URL: https://issues.apache.org/jira/browse/SPARK-7499 Project: Spark Issue Type: Improvement Components: SparkR Reporter: Shivaram Venkataraman Right now in SparkR we need to specify the columns used using `$` or strings. For example to run select we would do {code} df1 <- select(df, df$age > 10) {code} It would be good to infer the set of columns in a dataframe automatically and resolve symbols for column names. For example {code} df1 <- select(df, age > 10) {code} One way to do this is to build an environment with all the column names to column handles and then use `substitute(arg, env = columnNameEnv)` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6955) Do not let Yarn Shuffle Server retry its server port.
[ https://issues.apache.org/jira/browse/SPARK-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6955. Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Aaron Davidson (was: SaintBacchus) > Do not let Yarn Shuffle Server retry its server port. > - > > Key: SPARK-6955 > URL: https://issues.apache.org/jira/browse/SPARK-6955 > Project: Spark > Issue Type: Bug > Components: Shuffle, YARN >Affects Versions: 1.2.0 >Reporter: SaintBacchus >Assignee: Aaron Davidson >Priority: Minor > Fix For: 1.4.0 > > > It's better to let the NodeManager get down rather than take a port retry > when `spark.shuffle.service.port` has been conflicted during starting the > Spark Yarn Shuffle Server, because the retry mechanism will make the > inconsistency of shuffle port and also make client fail to find the port. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7427) Make sharedParams match in Scala, Python
[ https://issues.apache.org/jira/browse/SPARK-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535905#comment-14535905 ] Glenn Weidner commented on SPARK-7427: -- Thank you! I see mismatch now - for example: HasMaxIter in Scala: "max number of iterations (>= 0)" HasMaxIter in Python: "max number of iterations" I'll modify _shared_params_code_gen.py based on doc strings in SharedParamsCodeGen.scala and regenerate shared.py. > Make sharedParams match in Scala, Python > > > Key: SPARK-7427 > URL: https://issues.apache.org/jira/browse/SPARK-7427 > Project: Spark > Issue Type: Documentation > Components: ML, PySpark >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > The documentation for shared Params differs a little between Scala, Python. > The Python docs should be modified to match the Scala ones. This will > require modifying the sharedParamsCodeGen files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7498) Params.setDefault should not use varargs annotation
Joseph K. Bradley created SPARK-7498: Summary: Params.setDefault should not use varargs annotation Key: SPARK-7498 URL: https://issues.apache.org/jira/browse/SPARK-7498 Project: Spark Issue Type: Bug Components: Java API, ML Affects Versions: 1.4.0 Reporter: Joseph K. Bradley Assignee: Joseph K. Bradley In [SPARK-7429] and PR [https://github.com/apache/spark/pull/5960], I added the varargs annotation to Params.setDefault which takes a variable number of ParamPairs. It worked locally and on Jenkins for me. However, @mengxr reported issues compiling on his machine. So I'm reverting the change introduced in [https://github.com/apache/spark/pull/5960] by removing varargs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5913) Python API for ChiSqSelector
[ https://issues.apache.org/jira/browse/SPARK-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-5913. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 5939 [https://github.com/apache/spark/pull/5939] > Python API for ChiSqSelector > > > Key: SPARK-5913 > URL: https://issues.apache.org/jira/browse/SPARK-5913 > Project: Spark > Issue Type: New Feature > Components: MLlib, PySpark >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Yanbo Liang >Priority: Minor > Fix For: 1.4.0 > > > Add a Python API for mllib.feature.ChiSqSelector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7427) Make sharedParams match in Scala, Python
[ https://issues.apache.org/jira/browse/SPARK-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535800#comment-14535800 ] Joseph K. Bradley commented on SPARK-7427: -- Each parameter has built-in documentation passed to the Param constructor. That doc should match between Scala and Python. > Make sharedParams match in Scala, Python > > > Key: SPARK-7427 > URL: https://issues.apache.org/jira/browse/SPARK-7427 > Project: Spark > Issue Type: Documentation > Components: ML, PySpark >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > The documentation for shared Params differs a little between Scala, Python. > The Python docs should be modified to match the Scala ones. This will > require modifying the sharedParamsCodeGen files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7497) test_count_by_value_and_window is flaky
Xiangrui Meng created SPARK-7497: Summary: test_count_by_value_and_window is flaky Key: SPARK-7497 URL: https://issues.apache.org/jira/browse/SPARK-7497 Project: Spark Issue Type: Bug Components: PySpark, Streaming Affects Versions: 1.4.0 Reporter: Xiangrui Meng Assignee: Tathagata Das Saw this test failure in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32268/console {code} == FAIL: test_count_by_value_and_window (__main__.WindowFunctionTests) -- Traceback (most recent call last): File "pyspark/streaming/tests.py", line 418, in test_count_by_value_and_window self._test_func(input, func, expected) File "pyspark/streaming/tests.py", line 133, in _test_func self.assertEqual(expected, result) AssertionError: Lists differ: [[1], [2], [3], [4], [5], [6], [6], [6], [6], [6]] != [[1], [2], [3], [4], [5], [6], [6], [6]] First list contains 2 additional elements. First extra element 8: [6] - [[1], [2], [3], [4], [5], [6], [6], [6], [6], [6]] ? -- + [[1], [2], [3], [4], [5], [6], [6], [6]] -- {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4699) Make caseSensitive configurable in Analyzer.scala
[ https://issues.apache.org/jira/browse/SPARK-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-4699. - Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 5806 [https://github.com/apache/spark/pull/5806] > Make caseSensitive configurable in Analyzer.scala > - > > Key: SPARK-4699 > URL: https://issues.apache.org/jira/browse/SPARK-4699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.1.0 >Reporter: Jacky Li > Fix For: 1.4.0 > > > Currently, case sensitivity is true by default in Analyzer. It should be > configurable by setting SQLConf in the client application -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7407) Use uid and param name to identify a parameter instead of the param object
[ https://issues.apache.org/jira/browse/SPARK-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535754#comment-14535754 ] Apache Spark commented on SPARK-7407: - User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/6019 > Use uid and param name to identify a parameter instead of the param object > -- > > Key: SPARK-7407 > URL: https://issues.apache.org/jira/browse/SPARK-7407 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > Transferring parameter values from one to another have been the pain point in > the ML pipeline implementation. Because we use the param object as the key in > the param map, we have to correctly copy them when making a copy of the > transformer, estimator, and models. This becomes complicated when > meta-algorithms are involved. For example, in cross validation: > {code} > val cv = new CrossValidator() > .setEstimator(lr) > .setEstimatorParamMaps(epm) > {code} > When we make a copy of `cv` with extra params that contain estimator params, > {code} > cv.copy(ParamMap(cv.numFolds -> 3, lr.maxIter -> 10)) > {code} > we need to make a copy of the `lr` object as well and map `epm` to use the > new param keys from the old `lr`. This is quite error-prone, especially if > the estimator itself is another meta-algorithm. > Using uid + param name as the key in param maps and using the same uid in > copy (and between estimator/model pairs) would simplify the implementations. > We don't need to change the keys since the copied instance has the same id as > the original instance. And it is easier to find models from a fitted pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7427) Make sharedParams match in Scala, Python
[ https://issues.apache.org/jira/browse/SPARK-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535701#comment-14535701 ] Glenn Weidner commented on SPARK-7427: -- I've also generated the scaladoc by running build/sbt unidoc to compare against generated Python API docs. When all the individual shared parameters (e.g., HasInputCol) in sharedParams.scala created by SharedParamsCodeGen.scala are private, then no html is generated. If public, then the corresponding html is available in browser along with Param, Params, ParamMap, etc. under org.apache.spark.ml.param. [~josephkb] Can you provide a little more description regarding the "documentation for shared Params differs" between Scala and Python? I'm double-checking that my forked repository is in sync with latest from master since I only found sections for feature, classification, tuning, evaluation modules under pyspark.ml in my generated Python API docs. > Make sharedParams match in Scala, Python > > > Key: SPARK-7427 > URL: https://issues.apache.org/jira/browse/SPARK-7427 > Project: Spark > Issue Type: Documentation > Components: ML, PySpark >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > The documentation for shared Params differs a little between Scala, Python. > The Python docs should be modified to match the Scala ones. This will > require modifying the sharedParamsCodeGen files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534582#comment-14534582 ] Thu Kyaw edited comment on SPARK-3928 at 5/8/15 9:52 PM: - New parquet implementation does not contain wild card support yet, but you could still use old version of parquet implementation to get wildcard support. Just turn off sql Configuration. "spark.sql.parquet.useDataSourceApi" ( turn off by setting it false; by default it is true ). was (Author: tkyaw): New parquet implementation does not contain wild card support yet, but you could still use old version parquet implementation to get wildcard support. Just turn off sql Configuration. "spark.sql.parquet.useDataSourceApi" ( to false by default it is true ). > Support wildcard matches on Parquet files > - > > Key: SPARK-3928 > URL: https://issues.apache.org/jira/browse/SPARK-3928 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Reporter: Nicholas Chammas >Assignee: Cheng Lian >Priority: Minor > Fix For: 1.3.0 > > > {{SparkContext.textFile()}} supports patterns like {{part-*}} and > {{2014-\?\?-\?\?}}. > It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534582#comment-14534582 ] Thu Kyaw edited comment on SPARK-3928 at 5/8/15 9:50 PM: - New parquet implementation does not contain wild card support yet, but you could still use old version parquet implementation to get wildcard support. Just turn off sql Configuration. "spark.sql.parquet.useDataSourceApi" ( to false by default it is true ). was (Author: tkyaw): Hello [~lian cheng] please let me know if you want me to work on adding back the glob support. > Support wildcard matches on Parquet files > - > > Key: SPARK-3928 > URL: https://issues.apache.org/jira/browse/SPARK-3928 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Reporter: Nicholas Chammas >Assignee: Cheng Lian >Priority: Minor > Fix For: 1.3.0 > > > {{SparkContext.textFile()}} supports patterns like {{part-*}} and > {{2014-\?\?-\?\?}}. > It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7486) Add the streaming implementation for estimating quantiles and median
[ https://issues.apache.org/jira/browse/SPARK-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley closed SPARK-7486. Resolution: Duplicate > Add the streaming implementation for estimating quantiles and median > > > Key: SPARK-7486 > URL: https://issues.apache.org/jira/browse/SPARK-7486 > Project: Spark > Issue Type: New Feature > Components: ML, SQL >Reporter: Liang-Chi Hsieh > > Streaming implementations that can estimate quantiles, median are very useful > for ML algorithm and data statistics. > Apache DataFu Pig has this kind of implementation. We can port it to Spark. > Please refer to: > http://datafu.incubator.apache.org/docs/datafu/getting-started.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7486) Add the streaming implementation for estimating quantiles and median
[ https://issues.apache.org/jira/browse/SPARK-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535642#comment-14535642 ] Joseph K. Bradley commented on SPARK-7486: -- OK, I'll close it as a duplicate > Add the streaming implementation for estimating quantiles and median > > > Key: SPARK-7486 > URL: https://issues.apache.org/jira/browse/SPARK-7486 > Project: Spark > Issue Type: New Feature > Components: ML, SQL >Reporter: Liang-Chi Hsieh > > Streaming implementations that can estimate quantiles, median are very useful > for ML algorithm and data statistics. > Apache DataFu Pig has this kind of implementation. We can port it to Spark. > Please refer to: > http://datafu.incubator.apache.org/docs/datafu/getting-started.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds
[ https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-1517: -- Assignee: (was: Nicholas Chammas) > Publish nightly snapshots of documentation, maven artifacts, and binary builds > -- > > Key: SPARK-1517 > URL: https://issues.apache.org/jira/browse/SPARK-1517 > Project: Spark > Issue Type: Improvement > Components: Build, Project Infra >Reporter: Patrick Wendell >Priority: Blocker > > Should be pretty easy to do with Jenkins. The only thing I can think of that > would be tricky is to set up credentials so that jenkins can publish this > stuff somewhere on apache infra. > Ideally we don't want to have to put a private key on every jenkins box > (since they are otherwise pretty stateless). One idea is to encrypt these > credentials with a passphrase and post them somewhere publicly visible. Then > the jenkins build can download the credentials provided we set a passphrase > in an environment variable in jenkins. There may be simpler solutions as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7390) CovarianceCounter in StatFunctions might calculate incorrect result
[ https://issues.apache.org/jira/browse/SPARK-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-7390: - Assignee: Liang-Chi Hsieh > CovarianceCounter in StatFunctions might calculate incorrect result > --- > > Key: SPARK-7390 > URL: https://issues.apache.org/jira/browse/SPARK-7390 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Liang-Chi Hsieh >Assignee: Liang-Chi Hsieh > Fix For: 1.4.0 > > > CovarianceCounter in StatFunctions has a merging stage. In this merge > function, the other CovarianceCounter object sometimes has zero count that > causes the final CovarianceCounter with incorrect result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7390) CovarianceCounter in StatFunctions might calculate incorrect result
[ https://issues.apache.org/jira/browse/SPARK-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-7390. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 5931 [https://github.com/apache/spark/pull/5931] > CovarianceCounter in StatFunctions might calculate incorrect result > --- > > Key: SPARK-7390 > URL: https://issues.apache.org/jira/browse/SPARK-7390 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Liang-Chi Hsieh > Fix For: 1.4.0 > > > CovarianceCounter in StatFunctions has a merging stage. In this merge > function, the other CovarianceCounter object sometimes has zero count that > causes the final CovarianceCounter with incorrect result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7137) Add checkInputColumn back to Params and print more info
[ https://issues.apache.org/jira/browse/SPARK-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535624#comment-14535624 ] Rekha Joshi commented on SPARK-7137: Sorry [~gweidner] , [~josephkb] just saw it was unassigned when i created the patch.thanks > Add checkInputColumn back to Params and print more info > --- > > Key: SPARK-7137 > URL: https://issues.apache.org/jira/browse/SPARK-7137 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.4.0 >Reporter: Joseph K. Bradley >Priority: Trivial > > In the PR for [https://issues.apache.org/jira/browse/SPARK-5957], > Params.checkInputColumn was moved to SchemaUtils and renamed to > checkColumnType. The downside is that it no longer has access to the > parameter info, so it cannot state which input column parameter was incorrect. > We should keep checkColumnType but also add checkInputColumn back to Params. > It should print out the parameter name and description. Internally, it may > call checkColumnType. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2572) Can't delete local dir on executor automatically when running spark over Mesos.
[ https://issues.apache.org/jira/browse/SPARK-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-2572. -- Resolution: Duplicate > Can't delete local dir on executor automatically when running spark over > Mesos. > --- > > Key: SPARK-2572 > URL: https://issues.apache.org/jira/browse/SPARK-2572 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.0.0 >Reporter: Yadong Qi >Priority: Minor > > When running spark over Mesos in “fine-grained” modes or “coarse-grained” > mode. After the application finished.The local > dir(/tmp/spark-local-20140718114058-834c) on executor can't not delete > automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7486) Add the streaming implementation for estimating quantiles and median
[ https://issues.apache.org/jira/browse/SPARK-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535616#comment-14535616 ] Burak Yavuz commented on SPARK-7486: Yes, this is a clone of SPARK-6760 and SPARK-7246 (kinda). However it will be in Spark 1.5. > Add the streaming implementation for estimating quantiles and median > > > Key: SPARK-7486 > URL: https://issues.apache.org/jira/browse/SPARK-7486 > Project: Spark > Issue Type: New Feature > Components: ML, SQL >Reporter: Liang-Chi Hsieh > > Streaming implementations that can estimate quantiles, median are very useful > for ML algorithm and data statistics. > Apache DataFu Pig has this kind of implementation. We can port it to Spark. > Please refer to: > http://datafu.incubator.apache.org/docs/datafu/getting-started.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7245) Spearman correlation for DataFrames
[ https://issues.apache.org/jira/browse/SPARK-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz resolved SPARK-7245. Resolution: Done Fix Version/s: 1.4.0 > Spearman correlation for DataFrames > --- > > Key: SPARK-7245 > URL: https://issues.apache.org/jira/browse/SPARK-7245 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Xiangrui Meng > Fix For: 1.4.0 > > > Spearman correlation is harder than Pearson to compute. > ~~~ > df.stat.corr(col1, col2, method="spearman"): Double > ~~~ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7399) Master fails on 2.11 with compilation error
[ https://issues.apache.org/jira/browse/SPARK-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-7399. -- Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Tijo Thomas > Master fails on 2.11 with compilation error > --- > > Key: SPARK-7399 > URL: https://issues.apache.org/jira/browse/SPARK-7399 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Iulian Dragos >Assignee: Tijo Thomas > Fix For: 1.4.0 > > > The current code in master (and 1.4 branch) fails on 2.11 with the following > compilation error: > {code} > [error] /home/ubuntu/workspace/Apache Spark (master) on > 2.11/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala:78: in > object RDDOperationScope, multiple overloaded alternatives of method > withScope define default arguments. > [error] private[spark] object RDDOperationScope { > [error] ^ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7245) Spearman correlation for DataFrames
[ https://issues.apache.org/jira/browse/SPARK-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz reopened SPARK-7245: Sorry, mixed this with Pearson correlation > Spearman correlation for DataFrames > --- > > Key: SPARK-7245 > URL: https://issues.apache.org/jira/browse/SPARK-7245 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Xiangrui Meng > Fix For: 1.4.0 > > > Spearman correlation is harder than Pearson to compute. > ~~~ > df.stat.corr(col1, col2, method="spearman"): Double > ~~~ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7435) Make DataFrame.show() consistent with that of Scala and pySpark
[ https://issues.apache.org/jira/browse/SPARK-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535614#comment-14535614 ] Rekha Joshi commented on SPARK-7435: Thank you [~shivaram] and [~sunrui] for quick reply and good discussion.Updated git patch for review comments.thanks > Make DataFrame.show() consistent with that of Scala and pySpark > --- > > Key: SPARK-7435 > URL: https://issues.apache.org/jira/browse/SPARK-7435 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 1.4.0 >Reporter: Sun Rui >Priority: Critical > > Currently in SparkR, DataFrame has two methods show() and showDF(). show() > prints the DataFrame column names and types and showDF() prints the first > numRows rows of a DataFrame. > In Scala and pySpark, show() is used to prints rows of a DataFrame. > We'd better keep API consistent unless there is some important reason. So > propose to interchange the names (show() and showDF()) in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7486) Add the streaming implementation for estimating quantiles and median
[ https://issues.apache.org/jira/browse/SPARK-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535586#comment-14535586 ] Joseph K. Bradley commented on SPARK-7486: -- Ping [~brkyvz] Aren't you looking at something like this? > Add the streaming implementation for estimating quantiles and median > > > Key: SPARK-7486 > URL: https://issues.apache.org/jira/browse/SPARK-7486 > Project: Spark > Issue Type: New Feature > Components: ML, SQL >Reporter: Liang-Chi Hsieh > > Streaming implementations that can estimate quantiles, median are very useful > for ML algorithm and data statistics. > Apache DataFu Pig has this kind of implementation. We can port it to Spark. > Please refer to: > http://datafu.incubator.apache.org/docs/datafu/getting-started.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7483) [MLLib] Using Kryo with FPGrowth fails with an exception
[ https://issues.apache.org/jira/browse/SPARK-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535577#comment-14535577 ] Joseph K. Bradley commented on SPARK-7483: -- Does it fix anything if you give Kryo more info, such as explicit registration of relevant classes? > [MLLib] Using Kryo with FPGrowth fails with an exception > > > Key: SPARK-7483 > URL: https://issues.apache.org/jira/browse/SPARK-7483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.3.1 >Reporter: Tomasz Bartczak >Priority: Minor > > When using FPGrowth algorithm with KryoSerializer - Spark fails with > {code} > Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 9.0 (TID 16, localhost): > com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: > Can not set final scala.collection.mutable.ListBuffer field > org.apache.spark.mllib.fpm.FPTree$Summary.nodes to > scala.collection.mutable.ArrayBuffer > Serialization trace: > nodes (org.apache.spark.mllib.fpm.FPTree$Summary) > org$apache$spark$mllib$fpm$FPTree$$summaries > (org.apache.spark.mllib.fpm.FPTree) > {code} > This can be easily reproduced in spark codebase by setting > {code} > conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") > {code} and running FPGrowthSuite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7483) [MLLib] Using Kryo with FPGrowth fails with an exception
[ https://issues.apache.org/jira/browse/SPARK-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley reopened SPARK-7483: -- > [MLLib] Using Kryo with FPGrowth fails with an exception > > > Key: SPARK-7483 > URL: https://issues.apache.org/jira/browse/SPARK-7483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.3.1 >Reporter: Tomasz Bartczak >Priority: Minor > > When using FPGrowth algorithm with KryoSerializer - Spark fails with > {code} > Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 9.0 (TID 16, localhost): > com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: > Can not set final scala.collection.mutable.ListBuffer field > org.apache.spark.mllib.fpm.FPTree$Summary.nodes to > scala.collection.mutable.ArrayBuffer > Serialization trace: > nodes (org.apache.spark.mllib.fpm.FPTree$Summary) > org$apache$spark$mllib$fpm$FPTree$$summaries > (org.apache.spark.mllib.fpm.FPTree) > {code} > This can be easily reproduced in spark codebase by setting > {code} > conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") > {code} and running FPGrowthSuite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7483) [MLLib] Using Kryo with FPGrowth fails with an exception
[ https://issues.apache.org/jira/browse/SPARK-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535562#comment-14535562 ] Joseph K. Bradley edited comment on SPARK-7483 at 5/8/15 9:24 PM: -- (Updated) Maybe this is a bug...will look into it. was (Author: josephkb): I believe this is because it would need a custom serializer. Not all classes in Spark work with Kryo out of the box. But if you want to learn more and write your own, please check out: [http://spark.apache.org/docs/latest/tuning.html#data-serialization] Also, this kind of question should probably go to the user list before JIRA. I'll close this, but if you think I'm wrong, please bring up the issue on the user list! Thanks > [MLLib] Using Kryo with FPGrowth fails with an exception > > > Key: SPARK-7483 > URL: https://issues.apache.org/jira/browse/SPARK-7483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.3.1 >Reporter: Tomasz Bartczak >Priority: Minor > > When using FPGrowth algorithm with KryoSerializer - Spark fails with > {code} > Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 9.0 (TID 16, localhost): > com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: > Can not set final scala.collection.mutable.ListBuffer field > org.apache.spark.mllib.fpm.FPTree$Summary.nodes to > scala.collection.mutable.ArrayBuffer > Serialization trace: > nodes (org.apache.spark.mllib.fpm.FPTree$Summary) > org$apache$spark$mllib$fpm$FPTree$$summaries > (org.apache.spark.mllib.fpm.FPTree) > {code} > This can be easily reproduced in spark codebase by setting > {code} > conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") > {code} and running FPGrowthSuite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7483) [MLLib] Using Kryo with FPGrowth fails with an exception
[ https://issues.apache.org/jira/browse/SPARK-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley closed SPARK-7483. Resolution: Not A Problem > [MLLib] Using Kryo with FPGrowth fails with an exception > > > Key: SPARK-7483 > URL: https://issues.apache.org/jira/browse/SPARK-7483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.3.1 >Reporter: Tomasz Bartczak >Priority: Minor > > When using FPGrowth algorithm with KryoSerializer - Spark fails with > {code} > Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 9.0 (TID 16, localhost): > com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: > Can not set final scala.collection.mutable.ListBuffer field > org.apache.spark.mllib.fpm.FPTree$Summary.nodes to > scala.collection.mutable.ArrayBuffer > Serialization trace: > nodes (org.apache.spark.mllib.fpm.FPTree$Summary) > org$apache$spark$mllib$fpm$FPTree$$summaries > (org.apache.spark.mllib.fpm.FPTree) > {code} > This can be easily reproduced in spark codebase by setting > {code} > conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") > {code} and running FPGrowthSuite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7483) [MLLib] Using Kryo with FPGrowth fails with an exception
[ https://issues.apache.org/jira/browse/SPARK-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535562#comment-14535562 ] Joseph K. Bradley commented on SPARK-7483: -- I believe this is because it would need a custom serializer. Not all classes in Spark work with Kryo out of the box. But if you want to learn more and write your own, please check out: [http://spark.apache.org/docs/latest/tuning.html#data-serialization] Also, this kind of question should probably go to the user list before JIRA. I'll close this, but if you think I'm wrong, please bring up the issue on the user list! Thanks > [MLLib] Using Kryo with FPGrowth fails with an exception > > > Key: SPARK-7483 > URL: https://issues.apache.org/jira/browse/SPARK-7483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.3.1 >Reporter: Tomasz Bartczak >Priority: Minor > > When using FPGrowth algorithm with KryoSerializer - Spark fails with > {code} > Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 9.0 (TID 16, localhost): > com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: > Can not set final scala.collection.mutable.ListBuffer field > org.apache.spark.mllib.fpm.FPTree$Summary.nodes to > scala.collection.mutable.ArrayBuffer > Serialization trace: > nodes (org.apache.spark.mllib.fpm.FPTree$Summary) > org$apache$spark$mllib$fpm$FPTree$$summaries > (org.apache.spark.mllib.fpm.FPTree) > {code} > This can be easily reproduced in spark codebase by setting > {code} > conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") > {code} and running FPGrowthSuite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
[ https://issues.apache.org/jira/browse/SPARK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535563#comment-14535563 ] Tathagata Das commented on SPARK-6613: -- Any update with 1.3.1? > Starting stream from checkpoint causes Streaming tab to throw error > --- > > Key: SPARK-6613 > URL: https://issues.apache.org/jira/browse/SPARK-6613 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.2.1, 1.2.2 >Reporter: Marius Soutier > > When continuing my streaming job from a checkpoint, the job runs, but the > Streaming tab in the standard UI initially no longer works (browser just > shows HTTP ERROR: 500). Sometimes it gets back to normal after a while, and > sometimes it stays in this state permanently. > Stacktrace: > WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ > java.util.NoSuchElementException: key not found: 0 > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:58) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.Range.foreach(Range.scala:141) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) > at scala.Option.map(Option.scala:145) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) > at > org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) > at > org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) > at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) > at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) > at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > at org.eclipse.jetty.server.Server.handle(Server.java:370) > at > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) > at > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) > at > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > at > org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) > at > org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) > at > org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) >
[jira] [Commented] (SPARK-2572) Can't delete local dir on executor automatically when running spark over Mesos.
[ https://issues.apache.org/jira/browse/SPARK-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535557#comment-14535557 ] Prasanna Gautam commented on SPARK-2572: This is still happening as of Spark-1.3.0 with pySpark, when the context is closed the files aren't deleted. Neither does sc.clearFiles() seem to remove the /tmp/spark-* directories. > Can't delete local dir on executor automatically when running spark over > Mesos. > --- > > Key: SPARK-2572 > URL: https://issues.apache.org/jira/browse/SPARK-2572 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.0.0 >Reporter: Yadong Qi >Priority: Minor > > When running spark over Mesos in “fine-grained” modes or “coarse-grained” > mode. After the application finished.The local > dir(/tmp/spark-local-20140718114058-834c) on executor can't not delete > automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7398) Add back-pressure to Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-7398: --- Issue Type: Improvement (was: Bug) > Add back-pressure to Spark Streaming > > > Key: SPARK-7398 > URL: https://issues.apache.org/jira/browse/SPARK-7398 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.3.1 >Reporter: François Garillot > Labels: streams > > Spark Streaming has trouble dealing with situations where > batch processing time > batch interval > Meaning a high throughput of input data w.r.t. Spark's ability to remove data > from the queue. > If this throughput is sustained for long enough, it leads to an unstable > situation where the memory of the Receiver's Executor is overflowed. > This aims at transmitting a back-pressure signal back to data ingestion to > help with dealing with that high throughput, in a backwards-compatible way. > The design doc can be found here: > https://docs.google.com/document/d/1ZhiP_yBHcbjifz8nJEyPJpHqxB1FT6s8-Zk7sAfayQw/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7378) HistoryServer does not handle "deep" link when lazy loading app
[ https://issues.apache.org/jira/browse/SPARK-7378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-7378. Resolution: Fixed Fix Version/s: 1.4.0 Target Version/s: 1.4.0 > HistoryServer does not handle "deep" link when lazy loading app > --- > > Key: SPARK-7378 > URL: https://issues.apache.org/jira/browse/SPARK-7378 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Blocker > Fix For: 1.4.0 > > > This is a regression caused by SPARK-4705. When you go to a deep link into an > app that is not loaded yet, that used to work, but now that returns a 404. > You need to go into the root of the app first for the app to be loaded, which > is not the expected behaviour. > Fix coming up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7466) DAG visualization: orphaned nodes are not rendered correctly
[ https://issues.apache.org/jira/browse/SPARK-7466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-7466. Resolution: Fixed Fix Version/s: 1.4.0 > DAG visualization: orphaned nodes are not rendered correctly > > > Key: SPARK-7466 > URL: https://issues.apache.org/jira/browse/SPARK-7466 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 1.4.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Critical > Fix For: 1.4.0 > > Attachments: after.png, before.png > > > If you have an RDD instantiated outside of a scope, it is rendered as a weird > badge outside of a stage. This is because we keep the edge but do not inform > dagre-d3 of the node, resulting in the library rendering the node for us > without the expected styles and labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7489) Spark shell crashes when compiled with scala 2.11 and SPARK_PREPEND_CLASSES=true
[ https://issues.apache.org/jira/browse/SPARK-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-7489. Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Vinod KC Target Version/s: 1.4.0 > Spark shell crashes when compiled with scala 2.11 and > SPARK_PREPEND_CLASSES=true > > > Key: SPARK-7489 > URL: https://issues.apache.org/jira/browse/SPARK-7489 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Reporter: Vinod KC >Assignee: Vinod KC > Fix For: 1.4.0 > > > Steps followed > >export SPARK_PREPEND_CLASSES=true > >dev/change-version-to-2.11.sh > > sbt/sbt -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean assembly > >bin/spark-shell > > 15/05/08 22:31:35 INFO Main: Created spark context.. > Spark context available as sc. > java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf > at java.lang.Class.getDeclaredConstructors0(Native Method) > at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) > at java.lang.Class.getConstructor0(Class.java:3075) > at java.lang.Class.getConstructor(Class.java:1825) > at org.apache.spark.repl.Main$.createSQLContext(Main.scala:86) > ... 45 elided > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hive.conf.HiveConf > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 50 more > :11: error: not found: value sqlContext >import sqlContext.implicits._ > ^ > :11: error: not found: value sqlContext >import sqlContext.sql > There is a similar Resolved JIRA issue -SPARK-7470 and a PR > https://github.com/apache/spark/pull/5997 , which handled same issue only > in scala 2.10 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7443) MLlib 1.4 QA plan
[ https://issues.apache.org/jira/browse/SPARK-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-7443: - Description: TODO: create JIRAs for each task and assign them accordingly. h2. API * Check API compliance using java-compliance-checker (SPARK-7458) * Audit new public APIs (from the generated html doc) ** Scala (do not forget to check the object doc) ** Java compatibility ** Python API coverage * audit Pipeline APIs ** feature transformers ** tree models ** elastic-net ** ML attributes ** developer APIs * graduate spark.ml from alpha ** remove AlphaComponent annotations ** remove mima excludes for spark.ml h2. Algorithms and performance * list missing performance tests from spark-perf * LDA online/EM (SPARK-7455) * ElasticNet for linear regression and logistic regression (SPARK-7456) * Bernoulli naive Bayes (SPARK-7453) * PIC (SPARK-7454) * ALS.recommendAll (SPARK-7457) * perf-tests in Python correctness: * PMML ** scoring using PMML evaluator vs. MLlib models * save/load h2. Documentation and example code * create JIRAs for the user guide to each new algorithm and assign them to the corresponding author * create example code for major components ** cross validation in python ** pipeline with complex feature transformations (scala/java/python) ** elastic-net (possibly with cross validation) was: TODO: create JIRAs for each task and assign them accordingly. h2. API * Check API compliance using java-compliance-checker (SPARK-7458) * Audit new public APIs (from the generated html doc) ** Scala (do not forget to check the object doc) ** Java compatibility ** Python API coverage * audit Pipeline APIs ** feature transformers ** tree models ** elastic-net ** ML attributes ** developer APIs * graduate spark.ml from alpha ** remove AlphaComponent annotations ** remove mima excludes for spark.ml h2. Algorithms and performance * list missing performance tests from spark-perf * LDA online/EM (SPARK-7455) * ElasticNet (SPARK-7456) * Bernoulli naive Bayes (SPARK-7453) * PIC (SPARK-7454) * ALS.recommendAll (SPARK-7457) * perf-tests in Python correctness: * PMML ** scoring using PMML evaluator vs. MLlib models * save/load h2. Documentation and example code * create JIRAs for the user guide to each new algorithm and assign them to the corresponding author * create example code for major components ** cross validation in python ** pipeline with complex feature transformations (scala/java/python) ** elastic-net (possibly with cross validation) > MLlib 1.4 QA plan > - > > Key: SPARK-7443 > URL: https://issues.apache.org/jira/browse/SPARK-7443 > Project: Spark > Issue Type: Umbrella > Components: ML, MLlib >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Assignee: Joseph K. Bradley >Priority: Critical > > TODO: create JIRAs for each task and assign them accordingly. > h2. API > * Check API compliance using java-compliance-checker (SPARK-7458) > * Audit new public APIs (from the generated html doc) > ** Scala (do not forget to check the object doc) > ** Java compatibility > ** Python API coverage > * audit Pipeline APIs > ** feature transformers > ** tree models > ** elastic-net > ** ML attributes > ** developer APIs > * graduate spark.ml from alpha > ** remove AlphaComponent annotations > ** remove mima excludes for spark.ml > h2. Algorithms and performance > * list missing performance tests from spark-perf > * LDA online/EM (SPARK-7455) > * ElasticNet for linear regression and logistic regression (SPARK-7456) > * Bernoulli naive Bayes (SPARK-7453) > * PIC (SPARK-7454) > * ALS.recommendAll (SPARK-7457) > * perf-tests in Python > correctness: > * PMML > ** scoring using PMML evaluator vs. MLlib models > * save/load > h2. Documentation and example code > * create JIRAs for the user guide to each new algorithm and assign them to > the corresponding author > * create example code for major components > ** cross validation in python > ** pipeline with complex feature transformations (scala/java/python) > ** elastic-net (possibly with cross validation) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7456) Perf test for linear regression and logistic regression with elastic-net
[ https://issues.apache.org/jira/browse/SPARK-7456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-7456: - Summary: Perf test for linear regression and logistic regression with elastic-net (was: Perf test for linear regression with elastic-net) > Perf test for linear regression and logistic regression with elastic-net > > > Key: SPARK-7456 > URL: https://issues.apache.org/jira/browse/SPARK-7456 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Assignee: DB Tsai > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7490) MapOutputTracker: close input streams to free native memory
[ https://issues.apache.org/jira/browse/SPARK-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-7490: - Assignee: Evan Jones > MapOutputTracker: close input streams to free native memory > --- > > Key: SPARK-7490 > URL: https://issues.apache.org/jira/browse/SPARK-7490 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Evan Jones >Assignee: Evan Jones >Priority: Minor > Fix For: 1.2.3, 1.3.2, 1.4.0 > > > GZIPInputStream allocates native memory that is not freed until close() or > when the finalizer runs. It is best to close() these streams explicitly to > avoid native memory leaks > Pull request here: https://github.com/apache/spark/pull/5982 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7490) MapOutputTracker: close input streams to free native memory
[ https://issues.apache.org/jira/browse/SPARK-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-7490. -- Resolution: Fixed Fix Version/s: 1.4.0 1.2.3 1.3.2 Issue resolved by pull request 5982 [https://github.com/apache/spark/pull/5982] > MapOutputTracker: close input streams to free native memory > --- > > Key: SPARK-7490 > URL: https://issues.apache.org/jira/browse/SPARK-7490 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Evan Jones >Priority: Minor > Fix For: 1.3.2, 1.2.3, 1.4.0 > > > GZIPInputStream allocates native memory that is not freed until close() or > when the finalizer runs. It is best to close() these streams explicitly to > avoid native memory leaks > Pull request here: https://github.com/apache/spark/pull/5982 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7492) Convert LocalDataFrame to LocalMatrix
[ https://issues.apache.org/jira/browse/SPARK-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7492: --- Assignee: Apache Spark > Convert LocalDataFrame to LocalMatrix > - > > Key: SPARK-7492 > URL: https://issues.apache.org/jira/browse/SPARK-7492 > Project: Spark > Issue Type: New Feature > Components: MLlib, SQL >Reporter: Burak Yavuz >Assignee: Apache Spark > > Having a method like, > {code:java} > Matrices.fromDataFrame(df) > {code} > would provide users the ability to perform feature selection with DataFrames. > Users will be able to chain operations like below: > {code:java} > import org.apache.spark.mllib.linalg.Matrices > import org.apache.spark.mllib.stat.Statistics > import org.apache.spark.sql.DataFrame > val df = ... // the DataFrame > val contingencyTable = df.stat.crosstab(col1, col2) > val ct = Matrices.fromDataFrame(contingencyTable) > val result: ChiSqTestResult = Statistics.chiSqTest(ct) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7492) Convert LocalDataFrame to LocalMatrix
[ https://issues.apache.org/jira/browse/SPARK-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7492: --- Assignee: (was: Apache Spark) > Convert LocalDataFrame to LocalMatrix > - > > Key: SPARK-7492 > URL: https://issues.apache.org/jira/browse/SPARK-7492 > Project: Spark > Issue Type: New Feature > Components: MLlib, SQL >Reporter: Burak Yavuz > > Having a method like, > {code:java} > Matrices.fromDataFrame(df) > {code} > would provide users the ability to perform feature selection with DataFrames. > Users will be able to chain operations like below: > {code:java} > import org.apache.spark.mllib.linalg.Matrices > import org.apache.spark.mllib.stat.Statistics > import org.apache.spark.sql.DataFrame > val df = ... // the DataFrame > val contingencyTable = df.stat.crosstab(col1, col2) > val ct = Matrices.fromDataFrame(contingencyTable) > val result: ChiSqTestResult = Statistics.chiSqTest(ct) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7492) Convert LocalDataFrame to LocalMatrix
[ https://issues.apache.org/jira/browse/SPARK-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535515#comment-14535515 ] Apache Spark commented on SPARK-7492: - User 'brkyvz' has created a pull request for this issue: https://github.com/apache/spark/pull/6018 > Convert LocalDataFrame to LocalMatrix > - > > Key: SPARK-7492 > URL: https://issues.apache.org/jira/browse/SPARK-7492 > Project: Spark > Issue Type: New Feature > Components: MLlib, SQL >Reporter: Burak Yavuz > > Having a method like, > {code:java} > Matrices.fromDataFrame(df) > {code} > would provide users the ability to perform feature selection with DataFrames. > Users will be able to chain operations like below: > {code:java} > import org.apache.spark.mllib.linalg.Matrices > import org.apache.spark.mllib.stat.Statistics > import org.apache.spark.sql.DataFrame > val df = ... // the DataFrame > val contingencyTable = df.stat.crosstab(col1, col2) > val ct = Matrices.fromDataFrame(contingencyTable) > val result: ChiSqTestResult = Statistics.chiSqTest(ct) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7496) Update Programming guide with Online LDA
Joseph K. Bradley created SPARK-7496: Summary: Update Programming guide with Online LDA Key: SPARK-7496 URL: https://issues.apache.org/jira/browse/SPARK-7496 Project: Spark Issue Type: Documentation Components: Documentation, MLlib Reporter: Joseph K. Bradley Priority: Minor Update LDA subsection of clustering section of MLlib programming guide to include OnlineLDA -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7469) DAG visualization: show operators for SQL
[ https://issues.apache.org/jira/browse/SPARK-7469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7469: - Attachment: after.png before.png > DAG visualization: show operators for SQL > - > > Key: SPARK-7469 > URL: https://issues.apache.org/jira/browse/SPARK-7469 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 1.4.0 >Reporter: Andrew Or >Assignee: Andrew Or > Attachments: after.png, before.png > > > Right now the DAG shows low level Spark operations when SQL users really care > about physical operators. We should show those instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7495) Improve ML attribute documentation
Joseph K. Bradley created SPARK-7495: Summary: Improve ML attribute documentation Key: SPARK-7495 URL: https://issues.apache.org/jira/browse/SPARK-7495 Project: Spark Issue Type: Documentation Components: Documentation, ML Reporter: Joseph K. Bradley Priority: Minor ML attribute documentation is currently minimal. This has led to confusion in some Spark PRs about how to use them. We should add: * Scala doc * examples in the programming guide The docs should make at least these items clear: * What the different attribute types are * How an attribute and attribute group differ * Example usage creating, modifying, and reading attributes * Explanation that missing attributes are OK and can be computed/added lazily -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7461) Remove spark.ml Model, and have all Transformers have parent
[ https://issues.apache.org/jira/browse/SPARK-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-7461: - Issue Type: Sub-task (was: Improvement) Parent: SPARK-5874 > Remove spark.ml Model, and have all Transformers have parent > > > Key: SPARK-7461 > URL: https://issues.apache.org/jira/browse/SPARK-7461 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > A recent PR [https://github.com/apache/spark/pull/5980] brought up an issue > with the Model abstraction: There are transformers which could be > Transformers (created by a user) or Models (created by an Estimator). This > is the first instance, but there will be more such transformers in the future. > Some possible fixes are: > * Create 2 separate classes, 1 extending Transformer and 1 extending Model. > These would be essentially the same, and they could share code (or have 1 > wrap the other). This would bloat the API. > * Just use Model, with a possibly null parent class. There is precedence > (meta-algorithms like RandomForest producing weak hypothesis Models with no > parent). > * Change Transformer to have a parent which may be null. > ** *--> Unless there is strong disagreement, I think we should go with this > last option.* -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7494) spark.ml Model should call copyValues in construction
Joseph K. Bradley created SPARK-7494: Summary: spark.ml Model should call copyValues in construction Key: SPARK-7494 URL: https://issues.apache.org/jira/browse/SPARK-7494 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley Priority: Minor Currently, Estimators call Params.copyValues to copy parameters from themselves to the Model they create. The Model has its Estimator, so it could call copyValues upon construction. Note: I'm linking a patch which will remove Model and use Transformer instead, but this same fix with copyValues can be applied to Transformer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5980) Add GradientBoostedTrees Python examples to ML guide
[ https://issues.apache.org/jira/browse/SPARK-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-5980: - Target Version/s: 1.3.0 > Add GradientBoostedTrees Python examples to ML guide > > > Key: SPARK-5980 > URL: https://issues.apache.org/jira/browse/SPARK-5980 > Project: Spark > Issue Type: Documentation > Components: Documentation, MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Minor > Fix For: 1.3.0 > > > GBT now has a Python API and should have examples in the ML guide -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5980) Add GradientBoostedTrees Python examples to ML guide
[ https://issues.apache.org/jira/browse/SPARK-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-5980. -- Resolution: Fixed Fix Version/s: 1.3.0 > Add GradientBoostedTrees Python examples to ML guide > > > Key: SPARK-5980 > URL: https://issues.apache.org/jira/browse/SPARK-5980 > Project: Spark > Issue Type: Documentation > Components: Documentation, MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Minor > Fix For: 1.3.0 > > > GBT now has a Python API and should have examples in the ML guide -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7410) Add option to avoid broadcasting configuration with newAPIHadoopFile
[ https://issues.apache.org/jira/browse/SPARK-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535386#comment-14535386 ] Josh Rosen commented on SPARK-7410: --- We should confirm this, but if I recall the reason that we have to broadcast these separately has something to do with configuration mutability or thread-safety. Based on a quick glance at SPARK-2585, it looks like I tried folding this into the RDD broadcast but this caused performance issues for RDDs with huge numbers of tasks. If you're interested in fixing this, I'd take a closer look through that old JIRA to try to figure out whether its discussion is still relevant. > Add option to avoid broadcasting configuration with newAPIHadoopFile > > > Key: SPARK-7410 > URL: https://issues.apache.org/jira/browse/SPARK-7410 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Sandy Ryza > > I'm working with a Spark application that creates thousands of HadoopRDDs and > unions them together. Certain details of the way the data is stored require > this. > Creating ten thousand of these RDDs takes about 10 minutes, even before any > of them is used in an action. I dug into why this takes so long and it looks > like the overhead of broadcasting the Hadoop configuration is taking up most > of the time. In this case, the broadcasting isn't helpful because each > HadoopRDD only corresponds to one or two tasks. When I reverted the original > change that switched to broadcasting configurations, the time it took to > instantiate these RDDs improved 10x. > It would be nice if there was a way to turn this broadcasting off. Either > through a Spark configuration option, a Hadoop configuration option, or an > argument to hadoopFile / newAPIHadoopFile. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7493) ALTER TABLE statement
Sergey Semichev created SPARK-7493: -- Summary: ALTER TABLE statement Key: SPARK-7493 URL: https://issues.apache.org/jira/browse/SPARK-7493 Project: Spark Issue Type: Bug Components: SQL Environment: Databricks cloud Reporter: Sergey Semichev Priority: Minor Full table name (database_name.table_name) cannot be used with "ALTER TABLE" statement It works with CREATE TABLE "ALTER TABLE database_name.table_name ADD PARTITION (source_year='2014', source_month='01')." Error in SQL statement: java.lang.RuntimeException: org.apache.spark.sql.AnalysisException: mismatched input 'ADD' expecting KW_EXCHANGE near 'test_table' in alter exchange partition; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7492) Convert LocalDataFrame to LocalMatrix
[ https://issues.apache.org/jira/browse/SPARK-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-7492: --- Description: Having a method like, {code:java} Matrices.fromDataFrame(df) {code} would provide users the ability to perform feature selection with DataFrames. Users will be able to chain operations like below: {code:java} import org.apache.spark.mllib.linalg.Matrices import org.apache.spark.mllib.stat.Statistics import org.apache.spark.sql.DataFrame val df = ... // the DataFrame val contingencyTable = df.stat.crosstab(col1, col2) val ct = Matrices.fromDataFrame(contingencyTable) val result: ChiSqTestResult = Statistics.chiSqTest(ct) {code} was: Having a method like, {code: java} Matrices.fromDataFrame(df) {code} would provide users the ability to perform feature selection with DataFrames. Users will be able to chain operations like below: {code: java} import org.apache.spark.mllib.linalg.Matrices import org.apache.spark.mllib.stat.Statistics import org.apache.spark.sql.DataFrame val df = ... // the DataFrame val contingencyTable = df.stat.crosstab(col1, col2) val ct = Matrices.fromDataFrame(contingencyTable) val result: ChiSqTestResult = Statistics.chiSqTest(ct) {code} > Convert LocalDataFrame to LocalMatrix > - > > Key: SPARK-7492 > URL: https://issues.apache.org/jira/browse/SPARK-7492 > Project: Spark > Issue Type: New Feature > Components: MLlib, SQL >Reporter: Burak Yavuz > > Having a method like, > {code:java} > Matrices.fromDataFrame(df) > {code} > would provide users the ability to perform feature selection with DataFrames. > Users will be able to chain operations like below: > {code:java} > import org.apache.spark.mllib.linalg.Matrices > import org.apache.spark.mllib.stat.Statistics > import org.apache.spark.sql.DataFrame > val df = ... // the DataFrame > val contingencyTable = df.stat.crosstab(col1, col2) > val ct = Matrices.fromDataFrame(contingencyTable) > val result: ChiSqTestResult = Statistics.chiSqTest(ct) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7492) Convert LocalDataFrame to LocalMatrix
[ https://issues.apache.org/jira/browse/SPARK-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-7492: --- Description: Having a method like, {code: java} Matrices.fromDataFrame(df) {code} would provide users the ability to perform feature selection with DataFrames. Users will be able to chain operations like below: {code: java} import org.apache.spark.mllib.linalg.Matrices import org.apache.spark.mllib.stat.Statistics import org.apache.spark.sql.DataFrame val df = ... // the DataFrame val contingencyTable = df.stat.crosstab(col1, col2) val ct = Matrices.fromDataFrame(contingencyTable) val result: ChiSqTestResult = Statistics.chiSqTest(ct) {code} was: Having a method like, {code: scala} Matrices.fromDataFrame(df) {code} would provide users the ability to perform feature selection with DataFrames. Users will be able to chain operations like below: {code: scala} import org.apache.spark.mllib.linalg.Matrices import org.apache.spark.mllib.stat.Statistics import org.apache.spark.sql.DataFrame val df = ... // the DataFrame val contingencyTable = df.stat.crosstab(col1, col2) val ct = Matrices.fromDataFrame(contingencyTable) val result: ChiSqTestResult = Statistics.chiSqTest(ct) {code} > Convert LocalDataFrame to LocalMatrix > - > > Key: SPARK-7492 > URL: https://issues.apache.org/jira/browse/SPARK-7492 > Project: Spark > Issue Type: New Feature > Components: MLlib, SQL >Reporter: Burak Yavuz > > Having a method like, > {code: java} > Matrices.fromDataFrame(df) > {code} > would provide users the ability to perform feature selection with DataFrames. > Users will be able to chain operations like below: > {code: java} > import org.apache.spark.mllib.linalg.Matrices > import org.apache.spark.mllib.stat.Statistics > import org.apache.spark.sql.DataFrame > val df = ... // the DataFrame > val contingencyTable = df.stat.crosstab(col1, col2) > val ct = Matrices.fromDataFrame(contingencyTable) > val result: ChiSqTestResult = Statistics.chiSqTest(ct) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7492) Convert LocalDataFrame to LocalMatrix
Burak Yavuz created SPARK-7492: -- Summary: Convert LocalDataFrame to LocalMatrix Key: SPARK-7492 URL: https://issues.apache.org/jira/browse/SPARK-7492 Project: Spark Issue Type: New Feature Components: MLlib, SQL Reporter: Burak Yavuz Having a method like, {code: scala} Matrices.fromDataFrame(df) {code} would provide users the ability to perform feature selection with DataFrames. Users will be able to chain operations like below: {code: scala} import org.apache.spark.mllib.linalg.Matrices import org.apache.spark.mllib.stat.Statistics import org.apache.spark.sql.DataFrame val df = ... // the DataFrame val contingencyTable = df.stat.crosstab(col1, col2) val ct = Matrices.fromDataFrame(contingencyTable) val result: ChiSqTestResult = Statistics.chiSqTest(ct) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7491) Handle drivers for Metastore JDBC
Michael Armbrust created SPARK-7491: --- Summary: Handle drivers for Metastore JDBC Key: SPARK-7491 URL: https://issues.apache.org/jira/browse/SPARK-7491 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7487) Python API for ml.regression
[ https://issues.apache.org/jira/browse/SPARK-7487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535324#comment-14535324 ] Apache Spark commented on SPARK-7487: - User 'brkyvz' has created a pull request for this issue: https://github.com/apache/spark/pull/6016 > Python API for ml.regression > > > Key: SPARK-7487 > URL: https://issues.apache.org/jira/browse/SPARK-7487 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Reporter: Burak Yavuz > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7487) Python API for ml.regression
[ https://issues.apache.org/jira/browse/SPARK-7487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7487: --- Assignee: (was: Apache Spark) > Python API for ml.regression > > > Key: SPARK-7487 > URL: https://issues.apache.org/jira/browse/SPARK-7487 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Reporter: Burak Yavuz > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7487) Python API for ml.regression
[ https://issues.apache.org/jira/browse/SPARK-7487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7487: --- Assignee: Apache Spark > Python API for ml.regression > > > Key: SPARK-7487 > URL: https://issues.apache.org/jira/browse/SPARK-7487 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Reporter: Burak Yavuz >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7448) Implement custom bye array serializer for use in PySpark shuffle
[ https://issues.apache.org/jira/browse/SPARK-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535316#comment-14535316 ] Josh Rosen commented on SPARK-7448: --- This is a change that would be nice to performance benchmark. It might require a large job, such as a huge flatMap, before we see any significant improvement here. > Implement custom bye array serializer for use in PySpark shuffle > > > Key: SPARK-7448 > URL: https://issues.apache.org/jira/browse/SPARK-7448 > Project: Spark > Issue Type: Improvement > Components: PySpark, Shuffle >Reporter: Josh Rosen > > PySpark's shuffle typically shuffles Java RDDs that contain byte arrays. We > should implement a custom Serializer for use in these shuffles. This will > allow us to take advantage of shuffle optimizations like SPARK-7311 for > PySpark without requiring users to change the default serializer to > KryoSerializer (this is useful for JobServer-type applications). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7448) Implement custom bye array serializer for use in PySpark shuffle
[ https://issues.apache.org/jira/browse/SPARK-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-7448: -- Priority: Minor (was: Major) > Implement custom bye array serializer for use in PySpark shuffle > > > Key: SPARK-7448 > URL: https://issues.apache.org/jira/browse/SPARK-7448 > Project: Spark > Issue Type: Improvement > Components: PySpark, Shuffle >Reporter: Josh Rosen >Priority: Minor > > PySpark's shuffle typically shuffles Java RDDs that contain byte arrays. We > should implement a custom Serializer for use in these shuffles. This will > allow us to take advantage of shuffle optimizations like SPARK-7311 for > PySpark without requiring users to change the default serializer to > KryoSerializer (this is useful for JobServer-type applications). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7488) Python API for ml.recommendation
[ https://issues.apache.org/jira/browse/SPARK-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7488: --- Assignee: (was: Apache Spark) > Python API for ml.recommendation > > > Key: SPARK-7488 > URL: https://issues.apache.org/jira/browse/SPARK-7488 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Reporter: Burak Yavuz > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7488) Python API for ml.recommendation
[ https://issues.apache.org/jira/browse/SPARK-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535284#comment-14535284 ] Apache Spark commented on SPARK-7488: - User 'brkyvz' has created a pull request for this issue: https://github.com/apache/spark/pull/6015 > Python API for ml.recommendation > > > Key: SPARK-7488 > URL: https://issues.apache.org/jira/browse/SPARK-7488 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Reporter: Burak Yavuz > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7488) Python API for ml.recommendation
[ https://issues.apache.org/jira/browse/SPARK-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7488: --- Assignee: Apache Spark > Python API for ml.recommendation > > > Key: SPARK-7488 > URL: https://issues.apache.org/jira/browse/SPARK-7488 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Reporter: Burak Yavuz >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7490) MapOutputTracker: close input streams to free native memory
[ https://issues.apache.org/jira/browse/SPARK-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535264#comment-14535264 ] Apache Spark commented on SPARK-7490: - User 'evanj' has created a pull request for this issue: https://github.com/apache/spark/pull/5982 > MapOutputTracker: close input streams to free native memory > --- > > Key: SPARK-7490 > URL: https://issues.apache.org/jira/browse/SPARK-7490 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Evan Jones >Priority: Minor > > GZIPInputStream allocates native memory that is not freed until close() or > when the finalizer runs. It is best to close() these streams explicitly to > avoid native memory leaks > Pull request here: https://github.com/apache/spark/pull/5982 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org