[jira] [Commented] (SPARK-12957) Derive and propagate data constrains in logical plan
[ https://issues.apache.org/jira/browse/SPARK-12957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188856#comment-15188856 ] Sameer Agarwal commented on SPARK-12957: [~ksunitha] I've attached a copy of the design document to the JIRA: https://issues.apache.org/jira/secure/attachment/12792466/ConstraintPropagationinSparkSQL.pdf. Thanks! > Derive and propagate data constrains in logical plan > - > > Key: SPARK-12957 > URL: https://issues.apache.org/jira/browse/SPARK-12957 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Yin Huai >Assignee: Sameer Agarwal > Attachments: ConstraintPropagationinSparkSQL.pdf > > > Based on the semantic of a query plan, we can derive data constrains (e.g. if > a filter defines {{a > 10}}, we know that the output data of this filter > satisfy the constrain of {{a > 10}} and {{a is not null}}). We should build a > framework to derive and propagate constrains in the logical plan, which can > help us to build more advanced optimizations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists
[ https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-13798: Description: If the Aggregate does not have any aggregate expression, it is useless. We can replace Aggregate by Project. Then, Project can be collapsed or pushed down further. This only makes sense when the grouping and aggregate are identical. Thus, will do it later was: If the Aggregate does not have any aggregate expression, it is useless. We can replace Aggregate by Project. Then, Project can be collapsed or pushed down further. > Replace Aggregate by Project if no Aggregate Function exists > > > Key: SPARK-13798 > URL: https://issues.apache.org/jira/browse/SPARK-13798 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > If the Aggregate does not have any aggregate expression, it is useless. We > can replace Aggregate by Project. > Then, Project can be collapsed or pushed down further. > This only makes sense when the grouping and aggregate are identical. Thus, > will do it later -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists
[ https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-13798. --- Resolution: Later > Replace Aggregate by Project if no Aggregate Function exists > > > Key: SPARK-13798 > URL: https://issues.apache.org/jira/browse/SPARK-13798 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > If the Aggregate does not have any aggregate expression, it is useless. We > can replace Aggregate by Project. > Then, Project can be collapsed or pushed down further. > This only makes sense when the grouping and aggregate are identical. Thus, > will do it later -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12957) Derive and propagate data constrains in logical plan
[ https://issues.apache.org/jira/browse/SPARK-12957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sameer Agarwal updated SPARK-12957: --- Attachment: ConstraintPropagationinSparkSQL.pdf Design Document > Derive and propagate data constrains in logical plan > - > > Key: SPARK-12957 > URL: https://issues.apache.org/jira/browse/SPARK-12957 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Yin Huai >Assignee: Sameer Agarwal > Attachments: ConstraintPropagationinSparkSQL.pdf > > > Based on the semantic of a query plan, we can derive data constrains (e.g. if > a filter defines {{a > 10}}, we know that the output data of this filter > satisfy the constrain of {{a > 10}} and {{a is not null}}). We should build a > framework to derive and propagate constrains in the logical plan, which can > help us to build more advanced optimizations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists
[ https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reopened SPARK-13798: - > Replace Aggregate by Project if no Aggregate Function exists > > > Key: SPARK-13798 > URL: https://issues.apache.org/jira/browse/SPARK-13798 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > If the Aggregate does not have any aggregate expression, it is useless. We > can replace Aggregate by Project. > Then, Project can be collapsed or pushed down further. > This only makes sense when the grouping and aggregate are identical. Thus, > will do it later -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists
[ https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-13798. --- Resolution: Invalid > Replace Aggregate by Project if no Aggregate Function exists > > > Key: SPARK-13798 > URL: https://issues.apache.org/jira/browse/SPARK-13798 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > If the Aggregate does not have any aggregate expression, it is useless. We > can replace Aggregate by Project. > Then, Project can be collapsed or pushed down further. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists
[ https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-13798: Description: If the Aggregate does not have any aggregate expression, it is useless. We can replace Aggregate by Project. Then, Project can be collapsed or pushed down further. was:If the Aggregate does not have any aggregate expression, it is useless. We can replace Aggregate by Project. > Replace Aggregate by Project if no Aggregate Function exists > > > Key: SPARK-13798 > URL: https://issues.apache.org/jira/browse/SPARK-13798 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > If the Aggregate does not have any aggregate expression, it is useless. We > can replace Aggregate by Project. > Then, Project can be collapsed or pushed down further. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13134) add 'spark.streaming.kafka.partition.multiplier' into SparkConf
[ https://issues.apache.org/jira/browse/SPARK-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188834#comment-15188834 ] Sean Owen commented on SPARK-13134: --- I don't understand why there are two JIRAs here, or why you attached a patch, or why that after closing both? > add 'spark.streaming.kafka.partition.multiplier' into SparkConf > --- > > Key: SPARK-13134 > URL: https://issues.apache.org/jira/browse/SPARK-13134 > Project: Spark > Issue Type: Sub-task > Components: Input/Output >Affects Versions: 1.6.1 >Reporter: zhengcanbin > Attachments: 13134.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13706) Python Example for Train Validation Split Missing
[ https://issues.apache.org/jira/browse/SPARK-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-13706. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11547 [https://github.com/apache/spark/pull/11547] > Python Example for Train Validation Split Missing > - > > Key: SPARK-13706 > URL: https://issues.apache.org/jira/browse/SPARK-13706 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Reporter: Jeremy >Assignee: Jeremy >Priority: Minor > Fix For: 2.0.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > An example of how to use TrainValidationSplit in pyspark needs to be added. > Should be consistent with the current examples. I'll submit a PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13796) Lock release errors occur frequently in executor logs
[ https://issues.apache.org/jira/browse/SPARK-13796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-13796: -- Priority: Minor (was: Major) Do you have more detail on how to reproduce it? It's good that you think you see what introduced the problem, but how about linking to the commit or leaving some analysis of how it was introduced? Is there any actual impact, or just log noise?? > Lock release errors occur frequently in executor logs > - > > Key: SPARK-13796 > URL: https://issues.apache.org/jira/browse/SPARK-13796 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Nishkam Ravi >Priority: Minor > > Executor logs contain a lot of these error messages (irrespective of the > workload): > 16/03/08 17:53:07 ERROR executor.Executor: 1 block locks were not released by > TID = 1119 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists
[ https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188819#comment-15188819 ] Apache Spark commented on SPARK-13798: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/11565 > Replace Aggregate by Project if no Aggregate Function exists > > > Key: SPARK-13798 > URL: https://issues.apache.org/jira/browse/SPARK-13798 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > If the Aggregate does not have any aggregate expression, it is useless. We > can replace Aggregate by Project. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13797) Eliminate Unnecessary Window
[ https://issues.apache.org/jira/browse/SPARK-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13797: Assignee: (was: Apache Spark) > Eliminate Unnecessary Window > > > Key: SPARK-13797 > URL: https://issues.apache.org/jira/browse/SPARK-13797 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > If the Window does not have any window expression, it is useless. It might > happen after column pruning -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13797) Eliminate Unnecessary Window
[ https://issues.apache.org/jira/browse/SPARK-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13797: Assignee: Apache Spark > Eliminate Unnecessary Window > > > Key: SPARK-13797 > URL: https://issues.apache.org/jira/browse/SPARK-13797 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Apache Spark > > If the Window does not have any window expression, it is useless. It might > happen after column pruning -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13797) Eliminate Unnecessary Window
[ https://issues.apache.org/jira/browse/SPARK-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188818#comment-15188818 ] Apache Spark commented on SPARK-13797: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/11565 > Eliminate Unnecessary Window > > > Key: SPARK-13797 > URL: https://issues.apache.org/jira/browse/SPARK-13797 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > If the Window does not have any window expression, it is useless. It might > happen after column pruning -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists
[ https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13798: Assignee: Apache Spark > Replace Aggregate by Project if no Aggregate Function exists > > > Key: SPARK-13798 > URL: https://issues.apache.org/jira/browse/SPARK-13798 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Apache Spark > > If the Aggregate does not have any aggregate expression, it is useless. We > can replace Aggregate by Project. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists
[ https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13798: Assignee: (was: Apache Spark) > Replace Aggregate by Project if no Aggregate Function exists > > > Key: SPARK-13798 > URL: https://issues.apache.org/jira/browse/SPARK-13798 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > If the Aggregate does not have any aggregate expression, it is useless. We > can replace Aggregate by Project. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists
[ https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-13798: Description: If the Aggregate does not have any aggregate expression, it is useless. We can replace Aggregate by Project. > Replace Aggregate by Project if no Aggregate Function exists > > > Key: SPARK-13798 > URL: https://issues.apache.org/jira/browse/SPARK-13798 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > If the Aggregate does not have any aggregate expression, it is useless. We > can replace Aggregate by Project. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists
Xiao Li created SPARK-13798: --- Summary: Replace Aggregate by Project if no Aggregate Function exists Key: SPARK-13798 URL: https://issues.apache.org/jira/browse/SPARK-13798 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.0 Reporter: Xiao Li -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13797) Eliminate Unnecessary Window
Xiao Li created SPARK-13797: --- Summary: Eliminate Unnecessary Window Key: SPARK-13797 URL: https://issues.apache.org/jira/browse/SPARK-13797 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.0 Reporter: Xiao Li -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13797) Eliminate Unnecessary Window
[ https://issues.apache.org/jira/browse/SPARK-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-13797: Description: If the Window does not have any window expression, it is useless. It might happen after column pruning > Eliminate Unnecessary Window > > > Key: SPARK-13797 > URL: https://issues.apache.org/jira/browse/SPARK-13797 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > If the Window does not have any window expression, it is useless. It might > happen after column pruning -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models
[ https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13430: --- Assignee: Bryan Cutler > Expose ml summary function in PySpark for classification and regression models > -- > > Key: SPARK-13430 > URL: https://issues.apache.org/jira/browse/SPARK-13430 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Reporter: Shubhanshu Mishra >Assignee: Bryan Cutler > Labels: classification, java, ml, mllib, pyspark, regression, > scala, sparkr > > I think model summary interface which is available in Spark's scala, Java and > R interfaces should also be available in the python interface. > Similar to #SPARK-11494 > https://issues.apache.org/jira/browse/SPARK-11494 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12626) MLlib 2.0 Roadmap
[ https://issues.apache.org/jira/browse/SPARK-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188774#comment-15188774 ] Nick Pentreath commented on SPARK-12626: [~dbtsai] ok thanks - would like to take a look when it's ready. > MLlib 2.0 Roadmap > - > > Key: SPARK-12626 > URL: https://issues.apache.org/jira/browse/SPARK-12626 > Project: Spark > Issue Type: Umbrella > Components: ML, MLlib >Reporter: Joseph K. Bradley >Assignee: Xiangrui Meng >Priority: Blocker > Labels: roadmap > > This is a master list for MLlib improvements we plan to have in Spark 2.0. > Please view this list as a wish list rather than a concrete plan, because we > don't have an accurate estimate of available resources. Due to limited review > bandwidth, features appearing on this list will get higher priority during > code review. But feel free to suggest new items to the list in comments. We > are experimenting with this process. Your feedback would be greatly > appreciated. > h1. Instructions > h2. For contributors: > * Please read > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > carefully. Code style, documentation, and unit tests are important. > * If you are a first-time Spark contributor, please always start with a > [starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather > than a medium/big feature. Based on our experience, mixing the development > process with a big feature usually causes long delay in code review. > * Never work silently. Let everyone know on the corresponding JIRA page when > you start working on some features. This is to avoid duplicate work. For > small features, you don't need to wait to get JIRA assigned. > * For medium/big features or features with dependencies, please get assigned > first before coding and keep the ETA updated on the JIRA. If there exist no > activity on the JIRA page for a certain amount of time, the JIRA should be > released for other contributors. > * Do not claim multiple (>3) JIRAs at the same time. Try to finish them one > after another. > * Remember to add the `@Since("2.0.0")` annotation to new public APIs. > * Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code > review greatly helps to improve others' code as well as yours. > h2. For committers: > * Try to break down big features into small and specific JIRA tasks and link > them properly. > * Add a "starter" label to starter tasks. > * Put a rough estimate for medium/big features and track the progress. > * If you start reviewing a PR, please add yourself to the Shepherd field on > JIRA. > * If the code looks good to you, please comment "LGTM". For non-trivial PRs, > please ping a maintainer to make a final pass. > * After merging a PR, create and link JIRAs for Python, example code, and > documentation if applicable. > h1. Roadmap (*WIP*) > This is NOT [a complete list of MLlib JIRAs for > 2.0|https://issues.apache.org/jira/issues/?filter=12334385]. We only include > umbrella JIRAs and high-level tasks. > Major efforts in this release: > * `spark.ml`: Achieve feature parity for the `spark.ml` API, relative to the > `spark.mllib` API. This includes the Python API. > * Linear algebra: Separate out the linear algebra library as a standalone > project without a Spark dependency to simplify production deployment. > * Pipelines API: Complete critical improvements to the Pipelines API > * New features: As usual, we expect to expand the feature set of MLlib. > However, we will prioritize API parity over new features. _New algorithms > should be written for `spark.ml`, not `spark.mllib`._ > h2. Algorithms and performance > * iteratively re-weighted least squares (IRLS) for GLMs (SPARK-9835) > * estimator interface for GLMs (SPARK-12811) > * extended support for GLM model families and link functions in SparkR > (SPARK-12566) > * improved model summaries and stats via IRLS (SPARK-9837) > Additional (maybe lower priority): > * robust linear regression with Huber loss (SPARK-3181) > * vector-free L-BFGS (SPARK-10078) > * tree partition by features (SPARK-3717) > * local linear algebra (SPARK-6442) > * weighted instance support (SPARK-9610) > ** random forest (SPARK-9478) > ** GBT (SPARK-9612) > * locality sensitive hashing (LSH) (SPARK-5992) > * deep learning (SPARK-5575) > ** autoencoder (SPARK-10408) > ** restricted Boltzmann machine (RBM) (SPARK-4251) > ** convolutional neural network (stretch) > * factorization machine (SPARK-7008) > * distributed LU decomposition (SPARK-8514) > h2. Statistics > * bivariate statistics as UDAFs (SPARK-10385) > * R-like statistics for GLMs (SPARK-9835) > * sketch algorithms (cross listed) : approximate quantiles (SPARK-6761), > count-min sketch (SPARK-6763), Bloom filter (SPARK-1281
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188761#comment-15188761 ] Praveen Devarao commented on SPARK-12177: - Hi Cody, The last time when I got TopicParition and OffsetAndMetadata class made serializable, the argument was that these classes are used by end-users and are metadata class which would be needed for checkpoint purpose. As for ConsumerRecord, this class is meant to hold the actual data and would usually be not needed for checkpoint purpose...if we need the data we can always go to respective offset in respective topic from respective partition. Also, the ConsumerRecord class has members which are of generic type (K and V) so really the serialization depends on what type of object is flowed in by the user and if that is serializable. Given this, From Kafka perspective not sure how we can reason why would one want to mark this class as serializable. Thanks Praveen > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13796) Lock release errors occur frequently in executor logs
[ https://issues.apache.org/jira/browse/SPARK-13796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188762#comment-15188762 ] Nishkam Ravi commented on SPARK-13796: -- Running with master from March 7th (e52e597db48d069b98c1d404b221d3365f38fbb8) Error introduced by 633d63a48ad98754dc7c56f9ac150fc2aa4e42c5 > Lock release errors occur frequently in executor logs > - > > Key: SPARK-13796 > URL: https://issues.apache.org/jira/browse/SPARK-13796 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Nishkam Ravi > > Executor logs contain a lot of these error messages (irrespective of the > workload): > 16/03/08 17:53:07 ERROR executor.Executor: 1 block locks were not released by > TID = 1119 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13796) Lock release errors occur frequently in executor logs
Nishkam Ravi created SPARK-13796: Summary: Lock release errors occur frequently in executor logs Key: SPARK-13796 URL: https://issues.apache.org/jira/browse/SPARK-13796 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.0 Reporter: Nishkam Ravi Executor logs contain a lot of these error messages (irrespective of the workload): 16/03/08 17:53:07 ERROR executor.Executor: 1 block locks were not released by TID = 1119 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9289) OrcPartitionDiscoverySuite is slow to run
[ https://issues.apache.org/jira/browse/SPARK-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188710#comment-15188710 ] Dongjoon Hyun commented on SPARK-9289: -- Oh, it wasn't fast enough. I see. Thank you any way. > OrcPartitionDiscoverySuite is slow to run > - > > Key: SPARK-9289 > URL: https://issues.apache.org/jira/browse/SPARK-9289 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Reporter: Reynold Xin > > {code} > [info] - read partitioned table - normal case (18 seconds, 557 milliseconds) > [info] - read partitioned table - partition key included in orc file (5 > seconds, 160 milliseconds) > [info] - read partitioned table - with nulls (4 seconds, 69 milliseconds) > [info] - read partitioned table - with nulls and partition keys are included > in Orc file (3 seconds, 218 milliseconds) > {code} > Does the unit test really need to run for 18 secs, 5 secs, 4 secs, and 3 secs? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13793) PipeRDD doesn't propagate exceptions while reading parent RDD
[ https://issues.apache.org/jira/browse/SPARK-13793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13793: Assignee: (was: Apache Spark) > PipeRDD doesn't propagate exceptions while reading parent RDD > - > > Key: SPARK-13793 > URL: https://issues.apache.org/jira/browse/SPARK-13793 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Tejas Patil >Priority: Minor > > PipeRDD creates a process to run the command and spawns a thread to feed the > input data to the process's stdin. If there is any exception in the child > thread which gets the input data from the parent RDD, the child thread does > not propagate that exception to the main thread. eg. In event of fetch > failures, since the exception is not be propagated, the entire stage fails. > The correct behaviour would be to recompute the parent(s) and then relaunch > the stage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13793) PipeRDD doesn't propagate exceptions while reading parent RDD
[ https://issues.apache.org/jira/browse/SPARK-13793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188660#comment-15188660 ] Apache Spark commented on SPARK-13793: -- User 'tejasapatil' has created a pull request for this issue: https://github.com/apache/spark/pull/11628 > PipeRDD doesn't propagate exceptions while reading parent RDD > - > > Key: SPARK-13793 > URL: https://issues.apache.org/jira/browse/SPARK-13793 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Tejas Patil >Priority: Minor > > PipeRDD creates a process to run the command and spawns a thread to feed the > input data to the process's stdin. If there is any exception in the child > thread which gets the input data from the parent RDD, the child thread does > not propagate that exception to the main thread. eg. In event of fetch > failures, since the exception is not be propagated, the entire stage fails. > The correct behaviour would be to recompute the parent(s) and then relaunch > the stage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13793) PipedRDD doesn't propagate exceptions while reading parent RDDd
[ https://issues.apache.org/jira/browse/SPARK-13793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-13793: Summary: PipedRDD doesn't propagate exceptions while reading parent RDDd (was: PipeRDD doesn't propagate exceptions while reading parent RDD) > PipedRDD doesn't propagate exceptions while reading parent RDDd > --- > > Key: SPARK-13793 > URL: https://issues.apache.org/jira/browse/SPARK-13793 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Tejas Patil >Priority: Minor > > PipeRDD creates a process to run the command and spawns a thread to feed the > input data to the process's stdin. If there is any exception in the child > thread which gets the input data from the parent RDD, the child thread does > not propagate that exception to the main thread. eg. In event of fetch > failures, since the exception is not be propagated, the entire stage fails. > The correct behaviour would be to recompute the parent(s) and then relaunch > the stage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13793) PipeRDD doesn't propagate exceptions while reading parent RDD
[ https://issues.apache.org/jira/browse/SPARK-13793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13793: Assignee: Apache Spark > PipeRDD doesn't propagate exceptions while reading parent RDD > - > > Key: SPARK-13793 > URL: https://issues.apache.org/jira/browse/SPARK-13793 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Tejas Patil >Assignee: Apache Spark >Priority: Minor > > PipeRDD creates a process to run the command and spawns a thread to feed the > input data to the process's stdin. If there is any exception in the child > thread which gets the input data from the parent RDD, the child thread does > not propagate that exception to the main thread. eg. In event of fetch > failures, since the exception is not be propagated, the entire stage fails. > The correct behaviour would be to recompute the parent(s) and then relaunch > the stage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13795) ClassCast Exception while attempting to show() a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188636#comment-15188636 ] Ganesh Krishnan commented on SPARK-13795: - This is similar to this Scala bug: https://issues.scala-lang.org/browse/SI-6337 > ClassCast Exception while attempting to show() a DataFrame > -- > > Key: SPARK-13795 > URL: https://issues.apache.org/jira/browse/SPARK-13795 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 > Environment: Linux 14.04 LTS >Reporter: Ganesh Krishnan > > DataFrame Schema (by printSchema() ) is as follows > allDataJoined.printSchema() > |-- eventType: string (nullable = true) > |-- itemId: string (nullable = true) > |-- productId: string (nullable = true) > |-- productVersion: string (nullable = true) > |-- servicedBy: string (nullable = true) > |-- ACCOUNT_NAME: string (nullable = true) > |-- CONTENTGROUPID: string (nullable = true) > |-- PRODUCT_ID: string (nullable = true) > |-- PROFILE_ID: string (nullable = true) > |-- SALESADVISEREMAIL: string (nullable = true) > |-- businessName: string (nullable = true) > |-- contentGroupId: string (nullable = true) > |-- salesAdviserName: string (nullable = true) > |-- salesAdviserPhone: string (nullable = true) > There is NO column that has any datatype except String. There used to be > previously an inferred column of type long that was dropped > > DataFrame allDataJoined = whiteEventJoinedWithReference. >drop(rliDataFrame.col("occurredAtDate")); > allDataJoined.printSchema() : output above ^^ > Now > allDataJoined.show() throws the following exception vv > java.lang.ClassCastException: java.lang.Long cannot be cast to > java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) > at scala.math.Ordering$Int$.compare(Ordering.scala:256) > at scala.math.Ordering$class.gt(Ordering.scala:97) > at scala.math.Ordering$Int$.gt(Ordering.scala:256) > at > org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:457) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:383) > at > org.apache.spark.sql.catalyst.expressions.And.eval(predicates.scala:238) > at > org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$create$2.apply(predicates.scala:38) > at > org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$create$2.apply(predicates.scala:38) > at > org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$prunePartitions$1.apply(DataSourceStrategy.scala:257) > at > org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$prunePartitions$1.apply(DataSourceStrategy.scala:257) > at > scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$class.filter(TraversableLike.scala:263) > at scala.collection.AbstractTraversable.filter(Traversable.scala:105) > at > org.apache.spark.sql.execution.datasources.DataSourceStrategy$.prunePartitions(DataSourceStrategy.scala:257) > at > org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:82) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > at > org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.makeBroadcastHashJoin(SparkStrategies.scala:88) > at > org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.apply(SparkStrategies.scala:97) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > at > org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrateg
[jira] [Created] (SPARK-13795) ClassCast Exception while attempting to show() a DataFrame
Ganesh Krishnan created SPARK-13795: --- Summary: ClassCast Exception while attempting to show() a DataFrame Key: SPARK-13795 URL: https://issues.apache.org/jira/browse/SPARK-13795 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Environment: Linux 14.04 LTS Reporter: Ganesh Krishnan DataFrame Schema (by printSchema() ) is as follows allDataJoined.printSchema() |-- eventType: string (nullable = true) |-- itemId: string (nullable = true) |-- productId: string (nullable = true) |-- productVersion: string (nullable = true) |-- servicedBy: string (nullable = true) |-- ACCOUNT_NAME: string (nullable = true) |-- CONTENTGROUPID: string (nullable = true) |-- PRODUCT_ID: string (nullable = true) |-- PROFILE_ID: string (nullable = true) |-- SALESADVISEREMAIL: string (nullable = true) |-- businessName: string (nullable = true) |-- contentGroupId: string (nullable = true) |-- salesAdviserName: string (nullable = true) |-- salesAdviserPhone: string (nullable = true) There is NO column that has any datatype except String. There used to be previously an inferred column of type long that was dropped DataFrame allDataJoined = whiteEventJoinedWithReference. drop(rliDataFrame.col("occurredAtDate")); allDataJoined.printSchema() : output above ^^ Now allDataJoined.show() throws the following exception vv java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at scala.math.Ordering$Int$.compare(Ordering.scala:256) at scala.math.Ordering$class.gt(Ordering.scala:97) at scala.math.Ordering$Int$.gt(Ordering.scala:256) at org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:457) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:383) at org.apache.spark.sql.catalyst.expressions.And.eval(predicates.scala:238) at org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$create$2.apply(predicates.scala:38) at org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$create$2.apply(predicates.scala:38) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$prunePartitions$1.apply(DataSourceStrategy.scala:257) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$prunePartitions$1.apply(DataSourceStrategy.scala:257) at scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.filter(TraversableLike.scala:263) at scala.collection.AbstractTraversable.filter(Traversable.scala:105) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.prunePartitions(DataSourceStrategy.scala:257) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:82) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.makeBroadcastHashJoin(SparkStrategies.scala:88) at org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.apply(SparkStrategies.scala:97) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apach
[jira] [Deleted] (SPARK-10813) API design: high level class structuring regarding windowed and non-windowed streams
[ https://issues.apache.org/jira/browse/SPARK-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin deleted SPARK-10813: > API design: high level class structuring regarding windowed and non-windowed > streams > > > Key: SPARK-10813 > URL: https://issues.apache.org/jira/browse/SPARK-10813 > Project: Spark > Issue Type: Sub-task >Reporter: Reynold Xin >Assignee: Reynold Xin > > I can think of 3 high level alternatives for streaming data frames. See > https://github.com/rxin/spark/pull/17 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13146) API for managing streaming dataframes
[ https://issues.apache.org/jira/browse/SPARK-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13146. - Resolution: Fixed Fix Version/s: 2.0.0 > API for managing streaming dataframes > - > > Key: SPARK-13146 > URL: https://issues.apache.org/jira/browse/SPARK-13146 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Tathagata Das >Assignee: Tathagata Das > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-10819) Logical plan: determine logical operators needed
[ https://issues.apache.org/jira/browse/SPARK-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin deleted SPARK-10819: > Logical plan: determine logical operators needed > > > Key: SPARK-10819 > URL: https://issues.apache.org/jira/browse/SPARK-10819 > Project: Spark > Issue Type: Sub-task >Reporter: Reynold Xin > > Again, it would be great if we can just reuse Spark SQL's existing logical > plan. We might need to introduce new logical plans (e.g. windowing which is > different from Spark SQL's). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-10818) Query optimization: investigate whether we need a separate optimizer from Spark SQL's
[ https://issues.apache.org/jira/browse/SPARK-10818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin deleted SPARK-10818: > Query optimization: investigate whether we need a separate optimizer from > Spark SQL's > - > > Key: SPARK-10818 > URL: https://issues.apache.org/jira/browse/SPARK-10818 > Project: Spark > Issue Type: Sub-task >Reporter: Reynold Xin > > It would be great if we can just reuse Spark SQL's query optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13794) Rename DataFrameWriter.stream DataFrameWriter.startStream
[ https://issues.apache.org/jira/browse/SPARK-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188598#comment-15188598 ] Apache Spark commented on SPARK-13794: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/11627 > Rename DataFrameWriter.stream DataFrameWriter.startStream > - > > Key: SPARK-13794 > URL: https://issues.apache.org/jira/browse/SPARK-13794 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > This makes it more obvious with the verb "start" that we are actually > starting some execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13794) Rename DataFrameWriter.stream DataFrameWriter.startStream
[ https://issues.apache.org/jira/browse/SPARK-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13794: Assignee: Apache Spark (was: Reynold Xin) > Rename DataFrameWriter.stream DataFrameWriter.startStream > - > > Key: SPARK-13794 > URL: https://issues.apache.org/jira/browse/SPARK-13794 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > > This makes it more obvious with the verb "start" that we are actually > starting some execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13794) Rename DataFrameWriter.stream DataFrameWriter.startStream
[ https://issues.apache.org/jira/browse/SPARK-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13794: Assignee: Reynold Xin (was: Apache Spark) > Rename DataFrameWriter.stream DataFrameWriter.startStream > - > > Key: SPARK-13794 > URL: https://issues.apache.org/jira/browse/SPARK-13794 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > This makes it more obvious with the verb "start" that we are actually > starting some execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13794) Rename DataFrameWriter.stream DataFrameWriter.startStream
Reynold Xin created SPARK-13794: --- Summary: Rename DataFrameWriter.stream DataFrameWriter.startStream Key: SPARK-13794 URL: https://issues.apache.org/jira/browse/SPARK-13794 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin This makes it more obvious with the verb "start" that we are actually starting some execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12117) Column Aliases are Ignored in callUDF while using struct()
[ https://issues.apache.org/jira/browse/SPARK-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188593#comment-15188593 ] Liang-Chi Hsieh commented on SPARK-12117: - As I revisit this PR and find that this bug is already fixed in current codebase. I think we can close this now. > Column Aliases are Ignored in callUDF while using struct() > -- > > Key: SPARK-12117 > URL: https://issues.apache.org/jira/browse/SPARK-12117 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Sachin Aggarwal > > case where this works: > val TestDoc1 = sqlContext.createDataFrame(Seq(("sachin aggarwal", "1"), > ("Rishabh", "2"))).toDF("myText", "id") > > TestDoc1.select(callUDF("mydef",struct($"myText".as("Text"),$"id".as("label"))).as("col1")).show > steps to reproduce error case: > 1)create a file copy following text--filename(a.json) > { "myText": "Sachin Aggarwal","id": "1"} > { "myText": "Rishabh","id": "2"} > 2)define a simple UDF > def mydef(r:Row)={println(r.schema); r.getAs("Text").asInstanceOf[String]} > 3)register the udf > sqlContext.udf.register("mydef" ,mydef _) > 4)read the input file > val TestDoc2=sqlContext.read.json("/tmp/a.json") > 5)make a call to UDF > TestDoc2.select(callUDF("mydef",struct($"myText".as("Text"),$"id".as("label"))).as("col1")).show > ERROR received: > java.lang.IllegalArgumentException: Field "Text" does not exist. > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:234) > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:234) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:58) > at org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:233) > at > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema.fieldIndex(rows.scala:212) > at org.apache.spark.sql.Row$class.getAs(Row.scala:325) > at org.apache.spark.sql.catalyst.expressions.GenericRow.getAs(rows.scala:191) > at > $line414.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$c57ec8bf9b0d5f6161b97741d596ff0wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.mydef(:107) > at > $line419.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$c57ec8bf9b0d5f6161b97741d596ff0wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:110) > at > $line419.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$c57ec8bf9b0d5f6161b97741d596ff0wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:110) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:75) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:74) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:964) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$2.apply(basicOperators.scala:55) > at > org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$2.apply(basicOperators.scala:53) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1848) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1848) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.schedul
[jira] [Comment Edited] (SPARK-12345) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188589#comment-15188589 ] Eran Withana edited comment on SPARK-12345 at 3/10/16 3:20 AM: --- is the resolution to this issue available in Spark 1.6.0 release? I just used Spark 1.6.0 and got the following error in mesos logs, when it tried to run the task {code} I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1 I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 20160223-000314-3439362570-5050-631-S0 sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found {code} To provide more context, here is my spark-submit script {code} $SPARK_HOME/bin/spark-submit \ --class com.mycompany.SparkStarter \ --master mesos://mesos-dispatcher:7077 \ --name SparkStarterJob \ --driver-memory 1G \ --executor-memory 4G \ --deploy-mode cluster \ --total-executor-cores 1 \ --conf spark.mesos.executor.docker.image=echinthaka/mesos-spark:0.23.1-1.6.0-2.6 \ http://abc.com/spark-starter.jar {code} was (Author: eran.chinth...@gmail.com): is the resolution to this issue available in Spark 1.6.0 release? I just used Spark 1.6.0 and got the following error in mesos logs, when it tried to run the task {code} I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1 I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 20160223-000314-3439362570-5050-631-S0 sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found {code} To provide more context, here is my spark-submit script {code} $SPARK_HOME/bin/spark-submit \ `# main class to be run` \ --class com.mycompany.SparkStarter \ --master mesos://mesos-dispatcher:7077 \ --name SparkStarterJob \ --driver-memory 1G \ --executor-memory 4G \ --deploy-mode cluster \ --total-executor-cores 1 \ --conf spark.mesos.executor.docker.image=echinthaka/mesos-spark:0.23.1-1.6.0-2.6 \ http://abc.com/spark-starter.jar {code} > Mesos cluster mode is broken > > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Timothy Chen >Priority: Critical > Fix For: 1.6.0 > > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12345) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188589#comment-15188589 ] Eran Withana edited comment on SPARK-12345 at 3/10/16 3:19 AM: --- is the resolution to this issue available in Spark 1.6.0 release? I just used Spark 1.6.0 and got the following error in mesos logs, when it tried to run the task {code} I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1 I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 20160223-000314-3439362570-5050-631-S0 sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found {code} To provide more context, here is my spark-submit script {code} $SPARK_HOME/bin/spark-submit \ `# main class to be run` \ --class com.mycompany.SparkStarter \ --master mesos://mesos-dispatcher:7077 \ --name SparkStarterJob \ --driver-memory 1G \ --executor-memory 4G \ --deploy-mode cluster \ --total-executor-cores 1 \ --conf spark.mesos.executor.docker.image=echinthaka/mesos-spark:0.23.1-1.6.0-2.6 \ http://abc.com/spark-starter.jar {code} was (Author: eran.chinth...@gmail.com): is the resolution to this issue available in Spark 1.6.0 release? I just used Spark 1.6.0 and got the following error in mesos logs, when it tried to run the task {code} I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1 I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 20160223-000314-3439362570-5050-631-S0 sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found {code} > Mesos cluster mode is broken > > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Timothy Chen >Priority: Critical > Fix For: 1.6.0 > > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12345) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188589#comment-15188589 ] Eran Withana edited comment on SPARK-12345 at 3/10/16 3:16 AM: --- is the resolution to this issue available in Spark 1.6.0 release? I just used Spark 1.6.0 and got the following error in mesos logs, when it tried to run the task {code} I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1 I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 20160223-000314-3439362570-5050-631-S0 sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found {code} was (Author: eran.chinth...@gmail.com): is the resolution to this issue available in Spark 1.6.0 release? I just used Spark 1.6.0 and got the following error in mesos logs, when it tried to run the task ``` I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1 I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 20160223-000314-3439362570-5050-631-S0 sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found ``` > Mesos cluster mode is broken > > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Timothy Chen >Priority: Critical > Fix For: 1.6.0 > > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12345) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188589#comment-15188589 ] Eran Withana commented on SPARK-12345: -- is the resolution to this issue available in Spark 1.6.0 release? I just used Spark 1.6.0 and got the following error in mesos logs, when it tried to run the task ``` I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1 I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 20160223-000314-3439362570-5050-631-S0 sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found ``` > Mesos cluster mode is broken > > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Timothy Chen >Priority: Critical > Fix For: 1.6.0 > > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13766) Inconsistent file extensions and omitted file extensions written by CSV, TEXT and JSON data sources
[ https://issues.apache.org/jira/browse/SPARK-13766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13766. - Resolution: Fixed Assignee: Hyukjin Kwon Fix Version/s: 2.0.0 > Inconsistent file extensions and omitted file extensions written by CSV, TEXT > and JSON data sources > --- > > Key: SPARK-13766 > URL: https://issues.apache.org/jira/browse/SPARK-13766 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 2.0.0 > > > Currently, the output (part-files) from CSV, TEXT and JSON data sources do > not have file extensions such as .csv, .txt and .json (except for compression > extensions such as .gz, .deflate and .bz4). > In addition, it looks Parquet has the extensions (in part-files) such as > .gz.parquet or .snappy.parquet according to compression codecs whereas ORC > does not have such extensions but it is just .orc. > So, in a simple view, currently the extensions are set as below: > {code} > TEXT, CSV and JSON - [.COMPRESSION_CODEC_NAME] > Parquet - [.COMPRESSION_CODEC_NAME].parquet > ORC - .orc > {code} > It would be great if we have a consistent naming for them -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9289) OrcPartitionDiscoverySuite is slow to run
[ https://issues.apache.org/jira/browse/SPARK-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9289. Resolution: Not A Problem > OrcPartitionDiscoverySuite is slow to run > - > > Key: SPARK-9289 > URL: https://issues.apache.org/jira/browse/SPARK-9289 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Reporter: Reynold Xin > > {code} > [info] - read partitioned table - normal case (18 seconds, 557 milliseconds) > [info] - read partitioned table - partition key included in orc file (5 > seconds, 160 milliseconds) > [info] - read partitioned table - with nulls (4 seconds, 69 milliseconds) > [info] - read partitioned table - with nulls and partition keys are included > in Orc file (3 seconds, 218 milliseconds) > {code} > Does the unit test really need to run for 18 secs, 5 secs, 4 secs, and 3 secs? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9289) OrcPartitionDiscoverySuite is slow to run
[ https://issues.apache.org/jira/browse/SPARK-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188583#comment-15188583 ] Reynold Xin commented on SPARK-9289: Still pretty long but let me close this. > OrcPartitionDiscoverySuite is slow to run > - > > Key: SPARK-9289 > URL: https://issues.apache.org/jira/browse/SPARK-9289 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Reporter: Reynold Xin > > {code} > [info] - read partitioned table - normal case (18 seconds, 557 milliseconds) > [info] - read partitioned table - partition key included in orc file (5 > seconds, 160 milliseconds) > [info] - read partitioned table - with nulls (4 seconds, 69 milliseconds) > [info] - read partitioned table - with nulls and partition keys are included > in Orc file (3 seconds, 218 milliseconds) > {code} > Does the unit test really need to run for 18 secs, 5 secs, 4 secs, and 3 secs? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13793) PipeRDD doesn't propagate exceptions while reading parent RDD
Tejas Patil created SPARK-13793: --- Summary: PipeRDD doesn't propagate exceptions while reading parent RDD Key: SPARK-13793 URL: https://issues.apache.org/jira/browse/SPARK-13793 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Tejas Patil Priority: Minor PipeRDD creates a process to run the command and spawns a thread to feed the input data to the process's stdin. If there is any exception in the child thread which gets the input data from the parent RDD, the child thread does not propagate that exception to the main thread. eg. In event of fetch failures, since the exception is not be propagated, the entire stage fails. The correct behaviour would be to recompute the parent(s) and then relaunch the stage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7420) Flaky test: o.a.s.streaming.JobGeneratorSuite "Do not clear received block data too soon"
[ https://issues.apache.org/jira/browse/SPARK-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188574#comment-15188574 ] Apache Spark commented on SPARK-7420: - User 'lw-lin' has created a pull request for this issue: https://github.com/apache/spark/pull/11626 > Flaky test: o.a.s.streaming.JobGeneratorSuite "Do not clear received block > data too soon" > - > > Key: SPARK-7420 > URL: https://issues.apache.org/jira/browse/SPARK-7420 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 1.3.1, 1.4.0 >Reporter: Andrew Or >Assignee: Tathagata Das >Priority: Critical > Labels: flaky-test > > {code} > The code passed to eventually never returned normally. Attempted 18 times > over 10.13803606001 seconds. Last failure message: > receiverTracker.hasUnallocatedBlocks was false. > {code} > It seems to be failing only in maven. > https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-Maven-pre-YARN/hadoop.version=2.0.0-mr1-cdh4.1.2,label=centos/458/ > https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/459/ > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2173/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai reopened SPARK-13760: -- > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal >Priority: Trivial > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9289) OrcPartitionDiscoverySuite is slow to run
[ https://issues.apache.org/jira/browse/SPARK-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188567#comment-15188567 ] Dongjoon Hyun commented on SPARK-9289: -- Hi, [~rxin]. As of today, this issue seems to be solved. If then, could you close this issue? * Notebook {code} $ build/sbt "project hive" "test-only *OrcPartitionDiscoverySuite -- -z partitioned" ... [info] OrcPartitionDiscoverySuite: [info] - read partitioned table - normal case (4 seconds, 427 milliseconds) [info] - read partitioned table - partition key included in orc file (1 second, 419 milliseconds) [info] - read partitioned table - with nulls (911 milliseconds) [info] - read partitioned table - with nulls and partition keys are included in Orc file (747 milliseconds) {code} * Jenkins {code} [info] OrcPartitionDiscoverySuite: [info] - read partitioned table - normal case (1 second, 745 milliseconds) [info] - read partitioned table - partition key included in orc file (1 second, 961 milliseconds) [info] - read partitioned table - with nulls (1 second, 243 milliseconds) [info] - read partitioned table - with nulls and partition keys are included in Orc file (1 second, 1 millisecond) {code} > OrcPartitionDiscoverySuite is slow to run > - > > Key: SPARK-9289 > URL: https://issues.apache.org/jira/browse/SPARK-9289 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Reporter: Reynold Xin > > {code} > [info] - read partitioned table - normal case (18 seconds, 557 milliseconds) > [info] - read partitioned table - partition key included in orc file (5 > seconds, 160 milliseconds) > [info] - read partitioned table - with nulls (4 seconds, 69 milliseconds) > [info] - read partitioned table - with nulls and partition keys are included > in Orc file (3 seconds, 218 milliseconds) > {code} > Does the unit test really need to run for 18 secs, 5 secs, 4 secs, and 3 secs? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-13760. -- Resolution: Later > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal >Priority: Trivial > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188568#comment-15188568 ] Yin Huai commented on SPARK-13760: -- Set the resolution to later. Maybe we want to revisit it after we drop the Scala 2.10 support. > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal >Priority: Trivial > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sameer Agarwal closed SPARK-13760. -- Resolution: Won't Fix > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal >Priority: Trivial > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13760: Assignee: Apache Spark (was: Sameer Agarwal) > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Apache Spark >Priority: Trivial > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13760: Assignee: Sameer Agarwal (was: Apache Spark) > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal >Priority: Trivial > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188556#comment-15188556 ] Zhongshuai Pei commented on SPARK-4105: --- I had this happen in Spark 1.5.2 [~joshrosen] [~daniel.siegmann.aol] > FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based > shuffle > - > > Key: SPARK-4105 > URL: https://issues.apache.org/jira/browse/SPARK-4105 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.4.1 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Blocker > Attachments: JavaObjectToSerialize.java, > SparkFailedToUncompressGenerator.scala > > > We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during > shuffle read. Here's a sample stacktrace from an executor: > {code} > 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID > 33053) > java.io.IOException: FAILED_TO_UNCOMPRESS(5) > at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) > at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) > at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) > at org.xerial.snappy.Snappy.uncompress(Snappy.java:427) > at > org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127) > at > org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) > at org.xerial.snappy.SnappyInputStream.(SnappyInputStream.java:58) > at > org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) > at > org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at > org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129) > at > org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) > at > org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745)
[jira] [Reopened] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai reopened SPARK-13760: -- > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal >Priority: Trivial > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188555#comment-15188555 ] Yin Huai commented on SPARK-13760: -- Seems https://github.com/apache/spark/pull/11597 broke the scala 2.10 build. So, I have reverted it. > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal >Priority: Trivial > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-13760: - Fix Version/s: (was: 1.6.2) (was: 2.0.0) > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal >Priority: Trivial > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13792) Limit logging of bad records
Hossein Falaki created SPARK-13792: -- Summary: Limit logging of bad records Key: SPARK-13792 URL: https://issues.apache.org/jira/browse/SPARK-13792 Project: Spark Issue Type: Sub-task Reporter: Hossein Falaki Currently in PERMISSIVE and DROPMALFORMED modes we log any record that is going to be ignored. This can generate a lot of logs with large datasets. A better idea is to log the first record and the number of subsequent records for each partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-13760: - Assignee: Sameer Agarwal > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal >Priority: Trivial > Fix For: 1.6.2, 2.0.0 > > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-13760: - Fix Version/s: (was: 1.6.1) 1.6.2 > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Priority: Trivial > Fix For: 1.6.2, 2.0.0 > > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13760) Fix BigDecimal constructor for FloatType
[ https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-13760. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved by pull request 11597 [https://github.com/apache/spark/pull/11597] > Fix BigDecimal constructor for FloatType > > > Key: SPARK-13760 > URL: https://issues.apache.org/jira/browse/SPARK-13760 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Priority: Trivial > Fix For: 2.0.0, 1.6.1 > > > Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The > latter is deprecated and can result in inconsistencies due to an implicit > conversion to `Double`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13492) Configure a custom webui_url for the Spark Mesos Framework
[ https://issues.apache.org/jira/browse/SPARK-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-13492: -- Assignee: Sergiusz Urbaniak (was: Andrew Or) > Configure a custom webui_url for the Spark Mesos Framework > -- > > Key: SPARK-13492 > URL: https://issues.apache.org/jira/browse/SPARK-13492 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sergiusz Urbaniak >Assignee: Sergiusz Urbaniak >Priority: Minor > Fix For: 2.0.0 > > > Previously the Mesos framework webui URL was being derived only from the > Spark UI address leaving no possibility to configure it. This issue proposes > to make it configurable. If unset it falls back to the previous behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13492) Configure a custom webui_url for the Spark Mesos Framework
[ https://issues.apache.org/jira/browse/SPARK-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-13492: - Assignee: Andrew Or > Configure a custom webui_url for the Spark Mesos Framework > -- > > Key: SPARK-13492 > URL: https://issues.apache.org/jira/browse/SPARK-13492 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sergiusz Urbaniak >Assignee: Andrew Or >Priority: Minor > Fix For: 2.0.0 > > > Previously the Mesos framework webui URL was being derived only from the > Spark UI address leaving no possibility to configure it. This issue proposes > to make it configurable. If unset it falls back to the previous behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13492) Configure a custom webui_url for the Spark Mesos Framework
[ https://issues.apache.org/jira/browse/SPARK-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13492. --- Resolution: Fixed Fix Version/s: 2.0.0 > Configure a custom webui_url for the Spark Mesos Framework > -- > > Key: SPARK-13492 > URL: https://issues.apache.org/jira/browse/SPARK-13492 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sergiusz Urbaniak >Priority: Minor > Fix For: 2.0.0 > > > Previously the Mesos framework webui URL was being derived only from the > Spark UI address leaving no possibility to configure it. This issue proposes > to make it configurable. If unset it falls back to the previous behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13775) history server sort by completed time by default
[ https://issues.apache.org/jira/browse/SPARK-13775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13775. --- Resolution: Fixed Assignee: Zhuo Liu Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > history server sort by completed time by default > > > Key: SPARK-13775 > URL: https://issues.apache.org/jira/browse/SPARK-13775 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Thomas Graves >Assignee: Zhuo Liu >Priority: Trivial > Fix For: 2.0.0 > > > The new history server ui using datatables sorts by application Id. Lets > change it to sort by completed time like it did with the old table format. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13778) Master's ApplicationPage displays wrong application executor state when a worker is lost
[ https://issues.apache.org/jira/browse/SPARK-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13778. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Master's ApplicationPage displays wrong application executor state when a > worker is lost > > > Key: SPARK-13778 > URL: https://issues.apache.org/jira/browse/SPARK-13778 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > > When a worker is lost, the executors on this worker are also lost. But > Master's ApplicationPage still displays their states as running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13791) Add MetadataLog and HDFSMetadataLog
[ https://issues.apache.org/jira/browse/SPARK-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188467#comment-15188467 ] Apache Spark commented on SPARK-13791: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/11625 > Add MetadataLog and HDFSMetadataLog > --- > > Key: SPARK-13791 > URL: https://issues.apache.org/jira/browse/SPARK-13791 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > > - Add a MetadataLog interface for metadata reliably storage. > - Add HDFSMetadataLog as a MetadataLog implementation based on HDFS. > - Update FileStreamSource to use HDFSMetadataLog instead of managing metadata > by itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13791) Add MetadataLog and HDFSMetadataLog
[ https://issues.apache.org/jira/browse/SPARK-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13791: Assignee: Shixiong Zhu (was: Apache Spark) > Add MetadataLog and HDFSMetadataLog > --- > > Key: SPARK-13791 > URL: https://issues.apache.org/jira/browse/SPARK-13791 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > > - Add a MetadataLog interface for metadata reliably storage. > - Add HDFSMetadataLog as a MetadataLog implementation based on HDFS. > - Update FileStreamSource to use HDFSMetadataLog instead of managing metadata > by itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13791) Add MetadataLog and HDFSMetadataLog
[ https://issues.apache.org/jira/browse/SPARK-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13791: Assignee: Apache Spark (was: Shixiong Zhu) > Add MetadataLog and HDFSMetadataLog > --- > > Key: SPARK-13791 > URL: https://issues.apache.org/jira/browse/SPARK-13791 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Shixiong Zhu >Assignee: Apache Spark > > - Add a MetadataLog interface for metadata reliably storage. > - Add HDFSMetadataLog as a MetadataLog implementation based on HDFS. > - Update FileStreamSource to use HDFSMetadataLog instead of managing metadata > by itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13791) Add MetadataLog and HDFSMetadataLog
Shixiong Zhu created SPARK-13791: Summary: Add MetadataLog and HDFSMetadataLog Key: SPARK-13791 URL: https://issues.apache.org/jira/browse/SPARK-13791 Project: Spark Issue Type: Improvement Components: SQL Reporter: Shixiong Zhu Assignee: Shixiong Zhu - Add a MetadataLog interface for metadata reliably storage. - Add HDFSMetadataLog as a MetadataLog implementation based on HDFS. - Update FileStreamSource to use HDFSMetadataLog instead of managing metadata by itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-13782) Model export/import for spark.ml: BisectingKMeans
[ https://issues.apache.org/jira/browse/SPARK-13782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-13782: -- Comment: was deleted (was: Hi, [~josephkb]. May I work on this issue?) > Model export/import for spark.ml: BisectingKMeans > - > > Key: SPARK-13782 > URL: https://issues.apache.org/jira/browse/SPARK-13782 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-13747. -- Resolution: Fixed Fix Version/s: 2.0.0 > Concurrent execution in SQL doesn't work with Scala ForkJoinPool > > > Key: SPARK-13747 > URL: https://issues.apache.org/jira/browse/SPARK-13747 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Shixiong Zhu >Assignee: Andrew Or > Fix For: 2.0.0 > > > Run the following codes may fail > {code} > (1 to 100).par.foreach { _ => > println(sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()) > } > java.lang.IllegalArgumentException: spark.sql.execution.id is already set > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) > > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > {code} > This is because SparkContext.runJob can be suspended when using a > ForkJoinPool (e.g.,scala.concurrent.ExecutionContext.Implicits.global) as it > calls Await.ready (introduced by https://github.com/apache/spark/pull/9264). > So when SparkContext.runJob is suspended, ForkJoinPool will run another task > in the same thread, however, the local properties has been polluted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13790) Speed up ColumnVector's getDecimal
[ https://issues.apache.org/jira/browse/SPARK-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13790: Assignee: Apache Spark > Speed up ColumnVector's getDecimal > -- > > Key: SPARK-13790 > URL: https://issues.apache.org/jira/browse/SPARK-13790 > Project: Spark > Issue Type: Improvement >Reporter: Nong Li >Assignee: Apache Spark >Priority: Minor > > This should reuse a decimal object for the simple case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13790) Speed up ColumnVector's getDecimal
[ https://issues.apache.org/jira/browse/SPARK-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188451#comment-15188451 ] Apache Spark commented on SPARK-13790: -- User 'nongli' has created a pull request for this issue: https://github.com/apache/spark/pull/11624 > Speed up ColumnVector's getDecimal > -- > > Key: SPARK-13790 > URL: https://issues.apache.org/jira/browse/SPARK-13790 > Project: Spark > Issue Type: Improvement >Reporter: Nong Li >Priority: Minor > > This should reuse a decimal object for the simple case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13790) Speed up ColumnVector's getDecimal
[ https://issues.apache.org/jira/browse/SPARK-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13790: Assignee: (was: Apache Spark) > Speed up ColumnVector's getDecimal > -- > > Key: SPARK-13790 > URL: https://issues.apache.org/jira/browse/SPARK-13790 > Project: Spark > Issue Type: Improvement >Reporter: Nong Li >Priority: Minor > > This should reuse a decimal object for the simple case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12555) Datasets: data is corrupted when input data is reordered
[ https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188447#comment-15188447 ] Luciano Resende commented on SPARK-12555: - This issue is still reproducible in Spark 1.6.x but seems resolved in 2.x. I have added a test case in trunk (PR #11623) to avoid future regression, but please let us know if there is a need to backport fixes. > Datasets: data is corrupted when input data is reordered > > > Key: SPARK-12555 > URL: https://issues.apache.org/jira/browse/SPARK-12555 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 > Environment: ALL platforms on 1.6 >Reporter: Tim Preece > Labels: big-endian > > Testcase > --- > {code} > import org.apache.spark.sql.expressions.Aggregator > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.Dataset > case class people(age: Int, name: String) > object nameAgg extends Aggregator[people, String, String] { > def zero: String = "" > def reduce(b: String, a: people): String = a.name + b > def merge(b1: String, b2: String): String = b1 + b2 > def finish(r: String): String = r > } > object DataSetAgg { > def main(args: Array[String]) { > val conf = new SparkConf().setAppName("DataSetAgg") > val spark = new SparkContext(conf) > val sqlContext = new SQLContext(spark) > import sqlContext.implicits._ > val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS > name, 1279869254 AS age").as[people] > peopleds.groupBy(_.age).agg(nameAgg.toColumn).show() > } > } > {code} > Result ( on a Little Endian Platform ) > > {noformat} > +--+--+ > |_1|_2| > +--+--+ > |1279869254|FAILTi| > +--+--+ > {noformat} > Explanation > --- > Internally the String variable in the unsafe row is not updated after an > unsafe row join operation. > The displayed string is corrupted and shows part of the integer ( interpreted > as a string ) along with "Ti" > The column names also look different on a Little Endian platform. > Result ( on a Big Endian Platform ) > {noformat} > +--+--+ > | value|nameAgg$(name,age)| > +--+--+ > |1279869254|LIAFTi| > +--+--+ > {noformat} > The following Unit test also fails ( but only explicitly on a Big Endian > platorm ) > {code} > org.apache.spark.sql.DatasetAggregatorSuite > - typed aggregation: class input with reordering *** FAILED *** > Results do not match for query: > == Parsed Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Analyzed Logical Plan == > value: string, ClassInputAgg$(b,a): int > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Optimized Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Physical Plan == > TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], > output=[value#748,ClassInputAgg$(b,a)#762]) > +- TungstenExchange hashpartitioning(value#748,5), None > +- TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], > output=[value#748,value#758]) > +- !AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] >+- Project [one AS b#650,1 AS a#651] > +- Scan OneRowRelation[] > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![one,1][one,9] (QueryTest.scala:127) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13790) Speed up ColumnVector's getDecimal
Nong Li created SPARK-13790: --- Summary: Speed up ColumnVector's getDecimal Key: SPARK-13790 URL: https://issues.apache.org/jira/browse/SPARK-13790 Project: Spark Issue Type: Improvement Reporter: Nong Li Priority: Minor This should reuse a decimal object for the simple case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12555) Datasets: data is corrupted when input data is reordered
[ https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12555: Assignee: (was: Apache Spark) > Datasets: data is corrupted when input data is reordered > > > Key: SPARK-12555 > URL: https://issues.apache.org/jira/browse/SPARK-12555 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 > Environment: ALL platforms on 1.6 >Reporter: Tim Preece > Labels: big-endian > > Testcase > --- > {code} > import org.apache.spark.sql.expressions.Aggregator > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.Dataset > case class people(age: Int, name: String) > object nameAgg extends Aggregator[people, String, String] { > def zero: String = "" > def reduce(b: String, a: people): String = a.name + b > def merge(b1: String, b2: String): String = b1 + b2 > def finish(r: String): String = r > } > object DataSetAgg { > def main(args: Array[String]) { > val conf = new SparkConf().setAppName("DataSetAgg") > val spark = new SparkContext(conf) > val sqlContext = new SQLContext(spark) > import sqlContext.implicits._ > val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS > name, 1279869254 AS age").as[people] > peopleds.groupBy(_.age).agg(nameAgg.toColumn).show() > } > } > {code} > Result ( on a Little Endian Platform ) > > {noformat} > +--+--+ > |_1|_2| > +--+--+ > |1279869254|FAILTi| > +--+--+ > {noformat} > Explanation > --- > Internally the String variable in the unsafe row is not updated after an > unsafe row join operation. > The displayed string is corrupted and shows part of the integer ( interpreted > as a string ) along with "Ti" > The column names also look different on a Little Endian platform. > Result ( on a Big Endian Platform ) > {noformat} > +--+--+ > | value|nameAgg$(name,age)| > +--+--+ > |1279869254|LIAFTi| > +--+--+ > {noformat} > The following Unit test also fails ( but only explicitly on a Big Endian > platorm ) > {code} > org.apache.spark.sql.DatasetAggregatorSuite > - typed aggregation: class input with reordering *** FAILED *** > Results do not match for query: > == Parsed Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Analyzed Logical Plan == > value: string, ClassInputAgg$(b,a): int > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Optimized Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Physical Plan == > TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], > output=[value#748,ClassInputAgg$(b,a)#762]) > +- TungstenExchange hashpartitioning(value#748,5), None > +- TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], > output=[value#748,value#758]) > +- !AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] >+- Project [one AS b#650,1 AS a#651] > +- Scan OneRowRelation[] > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![one,1][one,9] (QueryTest.scala:127) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12555) Datasets: data is corrupted when input data is reordered
[ https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188438#comment-15188438 ] Apache Spark commented on SPARK-12555: -- User 'lresende' has created a pull request for this issue: https://github.com/apache/spark/pull/11623 > Datasets: data is corrupted when input data is reordered > > > Key: SPARK-12555 > URL: https://issues.apache.org/jira/browse/SPARK-12555 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 > Environment: ALL platforms on 1.6 >Reporter: Tim Preece > Labels: big-endian > > Testcase > --- > {code} > import org.apache.spark.sql.expressions.Aggregator > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.Dataset > case class people(age: Int, name: String) > object nameAgg extends Aggregator[people, String, String] { > def zero: String = "" > def reduce(b: String, a: people): String = a.name + b > def merge(b1: String, b2: String): String = b1 + b2 > def finish(r: String): String = r > } > object DataSetAgg { > def main(args: Array[String]) { > val conf = new SparkConf().setAppName("DataSetAgg") > val spark = new SparkContext(conf) > val sqlContext = new SQLContext(spark) > import sqlContext.implicits._ > val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS > name, 1279869254 AS age").as[people] > peopleds.groupBy(_.age).agg(nameAgg.toColumn).show() > } > } > {code} > Result ( on a Little Endian Platform ) > > {noformat} > +--+--+ > |_1|_2| > +--+--+ > |1279869254|FAILTi| > +--+--+ > {noformat} > Explanation > --- > Internally the String variable in the unsafe row is not updated after an > unsafe row join operation. > The displayed string is corrupted and shows part of the integer ( interpreted > as a string ) along with "Ti" > The column names also look different on a Little Endian platform. > Result ( on a Big Endian Platform ) > {noformat} > +--+--+ > | value|nameAgg$(name,age)| > +--+--+ > |1279869254|LIAFTi| > +--+--+ > {noformat} > The following Unit test also fails ( but only explicitly on a Big Endian > platorm ) > {code} > org.apache.spark.sql.DatasetAggregatorSuite > - typed aggregation: class input with reordering *** FAILED *** > Results do not match for query: > == Parsed Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Analyzed Logical Plan == > value: string, ClassInputAgg$(b,a): int > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Optimized Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Physical Plan == > TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], > output=[value#748,ClassInputAgg$(b,a)#762]) > +- TungstenExchange hashpartitioning(value#748,5), None > +- TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], > output=[value#748,value#758]) > +- !AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] >+- Project [one AS b#650,1 AS a#651] > +- Scan OneRowRelation[] > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![one,1][one,9] (QueryTest.scala:127) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12555) Datasets: data is corrupted when input data is reordered
[ https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12555: Assignee: Apache Spark > Datasets: data is corrupted when input data is reordered > > > Key: SPARK-12555 > URL: https://issues.apache.org/jira/browse/SPARK-12555 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 > Environment: ALL platforms on 1.6 >Reporter: Tim Preece >Assignee: Apache Spark > Labels: big-endian > > Testcase > --- > {code} > import org.apache.spark.sql.expressions.Aggregator > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.Dataset > case class people(age: Int, name: String) > object nameAgg extends Aggregator[people, String, String] { > def zero: String = "" > def reduce(b: String, a: people): String = a.name + b > def merge(b1: String, b2: String): String = b1 + b2 > def finish(r: String): String = r > } > object DataSetAgg { > def main(args: Array[String]) { > val conf = new SparkConf().setAppName("DataSetAgg") > val spark = new SparkContext(conf) > val sqlContext = new SQLContext(spark) > import sqlContext.implicits._ > val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS > name, 1279869254 AS age").as[people] > peopleds.groupBy(_.age).agg(nameAgg.toColumn).show() > } > } > {code} > Result ( on a Little Endian Platform ) > > {noformat} > +--+--+ > |_1|_2| > +--+--+ > |1279869254|FAILTi| > +--+--+ > {noformat} > Explanation > --- > Internally the String variable in the unsafe row is not updated after an > unsafe row join operation. > The displayed string is corrupted and shows part of the integer ( interpreted > as a string ) along with "Ti" > The column names also look different on a Little Endian platform. > Result ( on a Big Endian Platform ) > {noformat} > +--+--+ > | value|nameAgg$(name,age)| > +--+--+ > |1279869254|LIAFTi| > +--+--+ > {noformat} > The following Unit test also fails ( but only explicitly on a Big Endian > platorm ) > {code} > org.apache.spark.sql.DatasetAggregatorSuite > - typed aggregation: class input with reordering *** FAILED *** > Results do not match for query: > == Parsed Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Analyzed Logical Plan == > value: string, ClassInputAgg$(b,a): int > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Optimized Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Physical Plan == > TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], > output=[value#748,ClassInputAgg$(b,a)#762]) > +- TungstenExchange hashpartitioning(value#748,5), None > +- TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], > output=[value#748,value#758]) > +- !AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] >+- Project [one AS b#650,1 AS a#651] > +- Scan OneRowRelation[] > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![one,1][one,9] (QueryTest.scala:127) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13787) Feature importances for decision trees in Python
[ https://issues.apache.org/jira/browse/SPARK-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13787: Assignee: (was: Apache Spark) > Feature importances for decision trees in Python > > > Key: SPARK-13787 > URL: https://issues.apache.org/jira/browse/SPARK-13787 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Reporter: Joseph K. Bradley > > Expose feature importances for pyspark.ml DecisionTreeClassifier, > DecisionTreeRegressor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13787) Feature importances for decision trees in Python
[ https://issues.apache.org/jira/browse/SPARK-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188414#comment-15188414 ] Apache Spark commented on SPARK-13787: -- User 'sethah' has created a pull request for this issue: https://github.com/apache/spark/pull/11622 > Feature importances for decision trees in Python > > > Key: SPARK-13787 > URL: https://issues.apache.org/jira/browse/SPARK-13787 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Reporter: Joseph K. Bradley > > Expose feature importances for pyspark.ml DecisionTreeClassifier, > DecisionTreeRegressor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13787) Feature importances for decision trees in Python
[ https://issues.apache.org/jira/browse/SPARK-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13787: Assignee: Apache Spark > Feature importances for decision trees in Python > > > Key: SPARK-13787 > URL: https://issues.apache.org/jira/browse/SPARK-13787 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Reporter: Joseph K. Bradley >Assignee: Apache Spark > > Expose feature importances for pyspark.ml DecisionTreeClassifier, > DecisionTreeRegressor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13311) prettyString of IN is not good
[ https://issues.apache.org/jira/browse/SPARK-13311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188412#comment-15188412 ] Apache Spark commented on SPARK-13311: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/11514 > prettyString of IN is not good > -- > > Key: SPARK-13311 > URL: https://issues.apache.org/jira/browse/SPARK-13311 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Priority: Minor > > In(i_class,[Ljava.lang.Object;@1a575883)) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13125) makes the ratio of KafkaRDD partition to kafka topic partition configurable.
[ https://issues.apache.org/jira/browse/SPARK-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengcanbin updated SPARK-13125: Attachment: 13134.patch > makes the ratio of KafkaRDD partition to kafka topic partition configurable. > - > > Key: SPARK-13125 > URL: https://issues.apache.org/jira/browse/SPARK-13125 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 1.6.1 >Reporter: zhengcanbin > Labels: features > Attachments: 13134.patch > > Original Estimate: 96h > Remaining Estimate: 96h > > Now each given Kafka topic/partition corresponds to an RDD partition, in some > case it's quite necessary to make this configurable, namely a ratio > configuration of RDDPartition/kafkaTopicPartition is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13134) add 'spark.streaming.kafka.partition.multiplier' into SparkConf
[ https://issues.apache.org/jira/browse/SPARK-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengcanbin updated SPARK-13134: Attachment: 13134.patch > add 'spark.streaming.kafka.partition.multiplier' into SparkConf > --- > > Key: SPARK-13134 > URL: https://issues.apache.org/jira/browse/SPARK-13134 > Project: Spark > Issue Type: Sub-task > Components: Input/Output >Affects Versions: 1.6.1 >Reporter: zhengcanbin > Attachments: 13134.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188358#comment-15188358 ] yuhao yang commented on SPARK-13783: I'm interested. > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models
[ https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13430: Assignee: Apache Spark > Expose ml summary function in PySpark for classification and regression models > -- > > Key: SPARK-13430 > URL: https://issues.apache.org/jira/browse/SPARK-13430 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Reporter: Shubhanshu Mishra >Assignee: Apache Spark > Labels: classification, java, ml, mllib, pyspark, regression, > scala, sparkr > > I think model summary interface which is available in Spark's scala, Java and > R interfaces should also be available in the python interface. > Similar to #SPARK-11494 > https://issues.apache.org/jira/browse/SPARK-11494 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models
[ https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188336#comment-15188336 ] Apache Spark commented on SPARK-13430: -- User 'BryanCutler' has created a pull request for this issue: https://github.com/apache/spark/pull/11621 > Expose ml summary function in PySpark for classification and regression models > -- > > Key: SPARK-13430 > URL: https://issues.apache.org/jira/browse/SPARK-13430 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Reporter: Shubhanshu Mishra > Labels: classification, java, ml, mllib, pyspark, regression, > scala, sparkr > > I think model summary interface which is available in Spark's scala, Java and > R interfaces should also be available in the python interface. > Similar to #SPARK-11494 > https://issues.apache.org/jira/browse/SPARK-11494 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models
[ https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13430: Assignee: (was: Apache Spark) > Expose ml summary function in PySpark for classification and regression models > -- > > Key: SPARK-13430 > URL: https://issues.apache.org/jira/browse/SPARK-13430 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Reporter: Shubhanshu Mishra > Labels: classification, java, ml, mllib, pyspark, regression, > scala, sparkr > > I think model summary interface which is available in Spark's scala, Java and > R interfaces should also be available in the python interface. > Similar to #SPARK-11494 > https://issues.apache.org/jira/browse/SPARK-11494 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13761) Deprecate validateParams
[ https://issues.apache.org/jira/browse/SPARK-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13761: Assignee: Apache Spark > Deprecate validateParams > > > Key: SPARK-13761 > URL: https://issues.apache.org/jira/browse/SPARK-13761 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Assignee: Apache Spark >Priority: Minor > > Deprecate validateParams() method here: > [https://github.com/apache/spark/blob/035d3acdf3c1be5b309a861d5c5beb803b946b5e/mllib/src/main/scala/org/apache/spark/ml/param/params.scala#L553] > Move all functionality in overridden methods to transformSchema(). > Check docs to make sure they indicate complex Param interaction checks should > be done in transformSchema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13761) Deprecate validateParams
[ https://issues.apache.org/jira/browse/SPARK-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188329#comment-15188329 ] Apache Spark commented on SPARK-13761: -- User 'hhbyyh' has created a pull request for this issue: https://github.com/apache/spark/pull/11620 > Deprecate validateParams > > > Key: SPARK-13761 > URL: https://issues.apache.org/jira/browse/SPARK-13761 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > Deprecate validateParams() method here: > [https://github.com/apache/spark/blob/035d3acdf3c1be5b309a861d5c5beb803b946b5e/mllib/src/main/scala/org/apache/spark/ml/param/params.scala#L553] > Move all functionality in overridden methods to transformSchema(). > Check docs to make sure they indicate complex Param interaction checks should > be done in transformSchema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13761) Deprecate validateParams
[ https://issues.apache.org/jira/browse/SPARK-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13761: Assignee: (was: Apache Spark) > Deprecate validateParams > > > Key: SPARK-13761 > URL: https://issues.apache.org/jira/browse/SPARK-13761 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > Deprecate validateParams() method here: > [https://github.com/apache/spark/blob/035d3acdf3c1be5b309a861d5c5beb803b946b5e/mllib/src/main/scala/org/apache/spark/ml/param/params.scala#L553] > Move all functionality in overridden methods to transformSchema(). > Check docs to make sure they indicate complex Param interaction checks should > be done in transformSchema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13068) Extend pyspark ml paramtype conversion to support lists
[ https://issues.apache.org/jira/browse/SPARK-13068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188319#comment-15188319 ] Joseph K. Bradley commented on SPARK-13068: --- You're right that the current implementation would not support nested types well. But I don't think we need full-blown ParamValidators; we really need a separate concept in Python than in Scala: Python needs conversion, whereas Scala can handle validation. What if, instead of expectedType, we used a new field "typeConverter." It could be given as an argument to Param and used where expectedType currently is used. We could deprecate expectedType for 2.0 and remove it in 2.1. How does that sound? > Extend pyspark ml paramtype conversion to support lists > --- > > Key: SPARK-13068 > URL: https://issues.apache.org/jira/browse/SPARK-13068 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: holdenk >Priority: Trivial > > In SPARK-7675 we added type conversion for PySpark ML params. We should > follow up and support param type conversion for lists and nested structures > as required. This blocks having all PySpark ML params having type information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13789) Infer additional constraints from attribute equality
[ https://issues.apache.org/jira/browse/SPARK-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188307#comment-15188307 ] Apache Spark commented on SPARK-13789: -- User 'sameeragarwal' has created a pull request for this issue: https://github.com/apache/spark/pull/11618 > Infer additional constraints from attribute equality > > > Key: SPARK-13789 > URL: https://issues.apache.org/jira/browse/SPARK-13789 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Sameer Agarwal > > We should be able to infer additional set of data constraints based on > attribute equality. For e.g., if an operator has constraints of the form (`a > = 5`, `a = b`), we should be able to infer an additional constraint of the > form `b = 5` > cc [~nongli] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13789) Infer additional constraints from attribute equality
[ https://issues.apache.org/jira/browse/SPARK-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13789: Assignee: (was: Apache Spark) > Infer additional constraints from attribute equality > > > Key: SPARK-13789 > URL: https://issues.apache.org/jira/browse/SPARK-13789 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Sameer Agarwal > > We should be able to infer additional set of data constraints based on > attribute equality. For e.g., if an operator has constraints of the form (`a > = 5`, `a = b`), we should be able to infer an additional constraint of the > form `b = 5` > cc [~nongli] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org