[jira] [Assigned] (SPARK-10331) Update user guide to address minor comments during code review
[ https://issues.apache.org/jira/browse/SPARK-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10331: Assignee: Apache Spark (was: Xiangrui Meng) > Update user guide to address minor comments during code review > -- > > Key: SPARK-10331 > URL: https://issues.apache.org/jira/browse/SPARK-10331 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML, MLlib >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Assignee: Apache Spark > > Clean-up user guides to address some minor comments in: > https://github.com/apache/spark/pull/8304 > https://github.com/apache/spark/pull/8487 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10331) Update user guide to address minor comments during code review
[ https://issues.apache.org/jira/browse/SPARK-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10331: Assignee: Xiangrui Meng (was: Apache Spark) > Update user guide to address minor comments during code review > -- > > Key: SPARK-10331 > URL: https://issues.apache.org/jira/browse/SPARK-10331 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML, MLlib >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > Clean-up user guides to address some minor comments in: > https://github.com/apache/spark/pull/8304 > https://github.com/apache/spark/pull/8487 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10331) Update user guide to address minor comments during code review
[ https://issues.apache.org/jira/browse/SPARK-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721007#comment-14721007 ] Apache Spark commented on SPARK-10331: -- User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/8518 > Update user guide to address minor comments during code review > -- > > Key: SPARK-10331 > URL: https://issues.apache.org/jira/browse/SPARK-10331 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML, MLlib >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > Clean-up user guides to address some minor comments in: > https://github.com/apache/spark/pull/8304 > https://github.com/apache/spark/pull/8487 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10175) Enhance spark doap file
[ https://issues.apache.org/jira/browse/SPARK-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10175: -- Shepherd: (was: Matei Zaharia) Assignee: Sean Owen Target Version/s: 1.5.0 Priority: Minor (was: Major) No problem, I can get this one in as I am familiar with updating the site in SVN. > Enhance spark doap file > --- > > Key: SPARK-10175 > URL: https://issues.apache.org/jira/browse/SPARK-10175 > Project: Spark > Issue Type: Bug > Components: Project Infra >Reporter: Luciano Resende >Assignee: Sean Owen >Priority: Minor > Attachments: SPARK-10175 > > > The Spark doap has broken links and is also missing entries related to issue > tracker and mailing lists. This affects the list in projects.apache.org and > also in the main apache website. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10175) Enhance spark doap file
[ https://issues.apache.org/jira/browse/SPARK-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-10175. --- Resolution: Fixed Assignee: Luciano Resende (was: Sean Owen) Fix Version/s: 1.5.0 Fixed in SVN revision 1698445. I'll call this fixed for 1.5.0 even though it's not part of the project's source release per se. > Enhance spark doap file > --- > > Key: SPARK-10175 > URL: https://issues.apache.org/jira/browse/SPARK-10175 > Project: Spark > Issue Type: Bug > Components: Project Infra >Reporter: Luciano Resende >Assignee: Luciano Resende >Priority: Minor > Fix For: 1.5.0 > > Attachments: SPARK-10175 > > > The Spark doap has broken links and is also missing entries related to issue > tracker and mailing lists. This affects the list in projects.apache.org and > also in the main apache website. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10349) OneVsRest use "when ... otherwise" not UDF to generate new label at binary reduction
Yanbo Liang created SPARK-10349: --- Summary: OneVsRest use "when ... otherwise" not UDF to generate new label at binary reduction Key: SPARK-10349 URL: https://issues.apache.org/jira/browse/SPARK-10349 Project: Spark Issue Type: Improvement Components: ML Reporter: Yanbo Liang Priority: Minor Currently OneVsRest use UDF to generate new binary label during training. Considering that SPARK-7321 has been merged, we can use "when ... otherwise" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10349) OneVsRest use "when ... otherwise" not UDF to generate new label at binary reduction
[ https://issues.apache.org/jira/browse/SPARK-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10349: Assignee: (was: Apache Spark) > OneVsRest use "when ... otherwise" not UDF to generate new label at binary > reduction > -- > > Key: SPARK-10349 > URL: https://issues.apache.org/jira/browse/SPARK-10349 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Priority: Minor > > Currently OneVsRest use UDF to generate new binary label during training. > Considering that SPARK-7321 has been merged, we can use "when ... otherwise" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10349) OneVsRest use "when ... otherwise" not UDF to generate new label at binary reduction
[ https://issues.apache.org/jira/browse/SPARK-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721026#comment-14721026 ] Apache Spark commented on SPARK-10349: -- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/8519 > OneVsRest use "when ... otherwise" not UDF to generate new label at binary > reduction > -- > > Key: SPARK-10349 > URL: https://issues.apache.org/jira/browse/SPARK-10349 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Priority: Minor > > Currently OneVsRest use UDF to generate new binary label during training. > Considering that SPARK-7321 has been merged, we can use "when ... otherwise" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10349) OneVsRest use "when ... otherwise" not UDF to generate new label at binary reduction
[ https://issues.apache.org/jira/browse/SPARK-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10349: Assignee: Apache Spark > OneVsRest use "when ... otherwise" not UDF to generate new label at binary > reduction > -- > > Key: SPARK-10349 > URL: https://issues.apache.org/jira/browse/SPARK-10349 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Assignee: Apache Spark >Priority: Minor > > Currently OneVsRest use UDF to generate new binary label during training. > Considering that SPARK-7321 has been merged, we can use "when ... otherwise" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10349) OneVsRest use "when ... otherwise" not UDF to generate new label at binary reduction
[ https://issues.apache.org/jira/browse/SPARK-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-10349: Description: Currently OneVsRest use UDF to generate new binary label during training. Considering that SPARK-7321 has been merged, we can use "when ... otherwise" which will be more efficiency. was: Currently OneVsRest use UDF to generate new binary label during training. Considering that SPARK-7321 has been merged, we can use "when ... otherwise" > OneVsRest use "when ... otherwise" not UDF to generate new label at binary > reduction > -- > > Key: SPARK-10349 > URL: https://issues.apache.org/jira/browse/SPARK-10349 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Yanbo Liang >Priority: Minor > > Currently OneVsRest use UDF to generate new binary label during training. > Considering that SPARK-7321 has been merged, we can use "when ... otherwise" > which will be more efficiency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721066#comment-14721066 ] Maruf Aytekin commented on SPARK-5992: -- I have developed spark implementation of LSH for Charikar's scheme for collection of vectors. It is published here: https://github.com/marufaytekin/lsh-spark. The details are documented in Readme.md file. I'd really appreciate if you check it out and provide feedback. > Locality Sensitive Hashing (LSH) for MLlib > -- > > Key: SPARK-5992 > URL: https://issues.apache.org/jira/browse/SPARK-5992 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Joseph K. Bradley > > Locality Sensitive Hashing (LSH) would be very useful for ML. It would be > great to discuss some possible algorithms here, choose an API, and make a PR > for an initial algorithm. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10350) Fix SQL Programming Guide
Guoqiang Li created SPARK-10350: --- Summary: Fix SQL Programming Guide Key: SPARK-10350 URL: https://issues.apache.org/jira/browse/SPARK-10350 Project: Spark Issue Type: Bug Components: Documentation, SQL Affects Versions: 1.5.0 Reporter: Guoqiang Li Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10350) Fix SQL Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10350: Assignee: Apache Spark > Fix SQL Programming Guide > - > > Key: SPARK-10350 > URL: https://issues.apache.org/jira/browse/SPARK-10350 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Affects Versions: 1.5.0 >Reporter: Guoqiang Li >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10350) Fix SQL Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721074#comment-14721074 ] Apache Spark commented on SPARK-10350: -- User 'witgo' has created a pull request for this issue: https://github.com/apache/spark/pull/8520 > Fix SQL Programming Guide > - > > Key: SPARK-10350 > URL: https://issues.apache.org/jira/browse/SPARK-10350 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Affects Versions: 1.5.0 >Reporter: Guoqiang Li >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10350) Fix SQL Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10350: Assignee: (was: Apache Spark) > Fix SQL Programming Guide > - > > Key: SPARK-10350 > URL: https://issues.apache.org/jira/browse/SPARK-10350 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Affects Versions: 1.5.0 >Reporter: Guoqiang Li >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10350) Fix SQL Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10350: -- Target Version/s: (was: 1.5.0) [~gq] this doesn't contain any explanation, and neither does the pull request. I think you're familiar with the process for creating JIRAs and PRs in Spark: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Could I please ask you to write a clear description of the change, or else close this? > Fix SQL Programming Guide > - > > Key: SPARK-10350 > URL: https://issues.apache.org/jira/browse/SPARK-10350 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Affects Versions: 1.5.0 >Reporter: Guoqiang Li >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10350) Fix SQL Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-10350: Description: [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] contains duplicate content: [[spark.sql.parquet.mergeSchema]] > Fix SQL Programming Guide > - > > Key: SPARK-10350 > URL: https://issues.apache.org/jira/browse/SPARK-10350 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Affects Versions: 1.5.0 >Reporter: Guoqiang Li >Priority: Minor > > [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] > contains duplicate content: [[spark.sql.parquet.mergeSchema]] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10350) Fix SQL Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-10350: Description: [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] contains duplicate content: {{spark.sql.parquet.mergeSchema}} (was: [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] contains duplicate content: [[spark.sql.parquet.mergeSchema]]) > Fix SQL Programming Guide > - > > Key: SPARK-10350 > URL: https://issues.apache.org/jira/browse/SPARK-10350 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Affects Versions: 1.5.0 >Reporter: Guoqiang Li >Priority: Minor > > [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] > contains duplicate content: {{spark.sql.parquet.mergeSchema}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10350) Fix SQL Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-10350: Description: [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95#diff-d8aa7a37d17a1227cba38c99f9f22511R1383] contains duplicate content: {{spark.sql.parquet.mergeSchema}} (was: [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] contains duplicate content: {{spark.sql.parquet.mergeSchema}}) > Fix SQL Programming Guide > - > > Key: SPARK-10350 > URL: https://issues.apache.org/jira/browse/SPARK-10350 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Affects Versions: 1.5.0 >Reporter: Guoqiang Li >Priority: Minor > > [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95#diff-d8aa7a37d17a1227cba38c99f9f22511R1383] > contains duplicate content: {{spark.sql.parquet.mergeSchema}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail
[ https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721187#comment-14721187 ] Apache Spark commented on SPARK-10301: -- User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/8515 > For struct type, if parquet's global schema has less fields than a file's > schema, data reading will fail > > > Key: SPARK-10301 > URL: https://issues.apache.org/jira/browse/SPARK-10301 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Critical > > When parquet's global schema has less number of fields than the local schema > of a file, the data reading path will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10340) Use S3 bulk listing for S3-backed Hive tables
[ https://issues.apache.org/jira/browse/SPARK-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10340: - Target Version/s: 1.5.0 > Use S3 bulk listing for S3-backed Hive tables > - > > Key: SPARK-10340 > URL: https://issues.apache.org/jira/browse/SPARK-10340 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > > AWS S3 provides bulk listing API. It takes the common prefix of all input > paths as a parameter and returns all the objects whose prefixes start with > the common prefix in blocks of 1000. > Since SPARK-9926 allow us to list multiple partitions all together, we can > significantly speed up input split calculation using S3 bulk listing. This > optimization is particularly useful for queries like {{select * from > partitioned_table limit 10}}. > This is a common optimization for S3. For eg, here is a [blog > post|http://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/] > from Qubole on this topic. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10340) Use S3 bulk listing for S3-backed Hive tables
[ https://issues.apache.org/jira/browse/SPARK-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10340: - Target Version/s: 1.6.0 (was: 1.5.0) > Use S3 bulk listing for S3-backed Hive tables > - > > Key: SPARK-10340 > URL: https://issues.apache.org/jira/browse/SPARK-10340 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > > AWS S3 provides bulk listing API. It takes the common prefix of all input > paths as a parameter and returns all the objects whose prefixes start with > the common prefix in blocks of 1000. > Since SPARK-9926 allow us to list multiple partitions all together, we can > significantly speed up input split calculation using S3 bulk listing. This > optimization is particularly useful for queries like {{select * from > partitioned_table limit 10}}. > This is a common optimization for S3. For eg, here is a [blog > post|http://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/] > from Qubole on this topic. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10350) Fix SQL Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10350. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 8520 [https://github.com/apache/spark/pull/8520] > Fix SQL Programming Guide > - > > Key: SPARK-10350 > URL: https://issues.apache.org/jira/browse/SPARK-10350 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Affects Versions: 1.5.0 >Reporter: Guoqiang Li >Priority: Minor > Fix For: 1.5.0 > > > [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95#diff-d8aa7a37d17a1227cba38c99f9f22511R1383] > contains duplicate content: {{spark.sql.parquet.mergeSchema}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10170) Writing from data frame into db2 database using jdbc data source api fails with error for string, and boolean column types.
[ https://issues.apache.org/jira/browse/SPARK-10170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10170: - Target Version/s: 1.6.0 > Writing from data frame into db2 database using jdbc data source api fails > with error for string, and boolean column types. > --- > > Key: SPARK-10170 > URL: https://issues.apache.org/jira/browse/SPARK-10170 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Suresh Thalamati > > Repro : > -- start spark shell with classpath set to the db2 jdbc driver. > SPARK_CLASSPATH=~/myjars/db2jcc.jar ./spark-shell > > // set connetion properties > val properties = new java.util.Properties() > properties.setProperty("user" , "user") > properties.setProperty("password" , "password") > // load the driver. > Class.forName("com.ibm.db2.jcc.DB2Driver").newInstance > // create data frame with a String type > val empdf = sc.parallelize( Array((1,"John"), (2,"Mike"))).toDF("id", "name" ) > // write the data frame. this will fail with error. > empdf.write.jdbc("jdbc:db2://bdvs150.svl.ibm.com:6/SAMPLE:retrieveMessagesFromServerOnGetMessage=true;", > "emp_data", properties) > Error : > com.ibm.db2.jcc.am.SqlSyntaxErrorException: TEXT > at com.ibm.db2.jcc.am.fd.a(fd.java:679) > at com.ibm.db2.jcc.am.fd.a(fd.java:60) > .. > // create data frame with String , and Boolean types > val empdf = sc.parallelize( Array((1,"true".toBoolean ), (2, > "false".toBoolean ))).toDF("id", "isManager") > // write the data frame. this will fail with error. > empdf.write.jdbc("jdbc:db2://: > /SAMPLE:retrieveMessagesFromServerOnGetMessage=true;", "emp_data", properties) > Error : > com.ibm.db2.jcc.am.SqlSyntaxErrorException: TEXT > at com.ibm.db2.jcc.am.fd.a(fd.java:679) > at com.ibm.db2.jcc.am.fd.a(fd.java:60) > Write is failing because by default JDBC data source implementation > generating table schema with unsupported data types TEXT for String, and > BIT1(1) for Boolean. I think String type should get mapped to CLOB/VARCHAR, > and boolean type should be mapped to CHAR(1) for DB2 database. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10344) Add tests for extraStrategies
[ https://issues.apache.org/jira/browse/SPARK-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-10344. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 8516 [https://github.com/apache/spark/pull/8516] > Add tests for extraStrategies > - > > Key: SPARK-10344 > URL: https://issues.apache.org/jira/browse/SPARK-10344 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Michael Armbrust >Assignee: Michael Armbrust > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10226. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 8420 [https://github.com/apache/spark/pull/8420] > Error occured in SparkSQL when using != > > > Key: SPARK-10226 > URL: https://issues.apache.org/jira/browse/SPARK-10226 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: wangwei > Fix For: 1.5.0 > > > DataSource: > src/main/resources/kv1.txt > SQL: > 1. create table src(id string, name string); > 2. load data local inpath > '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; > 3. select count( * ) from src where id != '0'; > [ERROR] Could not expand event > java.lang.IllegalArgumentException: != 0;: event not found > at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) > at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) > at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) > at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) > at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10330) Use SparkHadoopUtil TaskAttemptContext reflection methods in more places
[ https://issues.apache.org/jira/browse/SPARK-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721275#comment-14721275 ] Apache Spark commented on SPARK-10330: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/8521 > Use SparkHadoopUtil TaskAttemptContext reflection methods in more places > > > Key: SPARK-10330 > URL: https://issues.apache.org/jira/browse/SPARK-10330 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Josh Rosen > > SparkHadoopUtil contains methods that use reflection to work around > TaskAttemptContext binary incompatibilities between Hadoop 1.x and 2.x. We > should use these methods in more places. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9926) Parallelize file listing for partitioned Hive table
[ https://issues.apache.org/jira/browse/SPARK-9926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-9926: --- Assignee: Cheolsoo Park > Parallelize file listing for partitioned Hive table > --- > > Key: SPARK-9926 > URL: https://issues.apache.org/jira/browse/SPARK-9926 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > > In Spark SQL, short queries like {{select * from table limit 10}} run very > slowly against partitioned Hive tables because of file listing. In > particular, if a large number of partitions are scanned on storage like S3, > the queries run extremely slowly. Here are some example benchmarks in my > environment- > * Parquet-backed Hive table > * Partitioned by dateint and hour > * Stored on S3 > ||\# of partitions||\# of files||runtime||query|| > |1|972|30 secs|select * from nccp_log where dateint=20150601 and hour=0 limit > 10;| > |24|13646|6 mins|select * from nccp_log where dateint=20150601 limit 10;| > |240|136222|1 hour|select * from nccp_log where dateint>=20150601 and > dateint<=20150610 limit 10;| > The problem is that {{TableReader}} constructs a separate HadoopRDD per Hive > partition path and group them into a UnionRDD. Then, all the input files are > listed sequentially. In other tools such as Hive and Pig, this can be solved > by setting > [mapreduce.input.fileinputformat.list-status.num-threads|https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml] > high. But in Spark, since each HadoopRDD lists only one partition path, > setting this property doesn't help. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
Feynman Liang created SPARK-10351: - Summary: UnsafeRow.getUTF8String should handle off-heap memory Key: SPARK-10351 URL: https://issues.apache.org/jira/browse/SPARK-10351 Project: Spark Issue Type: Bug Components: SQL Reporter: Feynman Liang Priority: Critical {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which does not handle off-heap memory correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721286#comment-14721286 ] Feynman Liang commented on SPARK-10351: --- I'm working on a PR to make my use case work. [~rxin] is this a bug or actually intended behavior (and I'm just not interpreting correctly)? > UnsafeRow.getUTF8String should handle off-heap memory > - > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > does not handle off-heap memory correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Description: {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which returns {{null}} when passed a {{null}} base object, failing to handle off-heap memory correctly. (was: {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which does not handle off-heap memory correctly. ) > UnsafeRow.getUTF8String should handle off-heap memory > - > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap memory correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Description: {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which returns {{null}} when passed a {{null}} base object, failing to handle off-heap memory correctly. This will also cause a {{NullPointerException}} when {{getString}} is called with off-heap storage. was:{{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which returns {{null}} when passed a {{null}} base object, failing to handle off-heap memory correctly. > UnsafeRow.getUTF8String should handle off-heap memory > - > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap memory correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721286#comment-14721286 ] Feynman Liang edited comment on SPARK-10351 at 8/29/15 11:12 PM: - I'm working on a PR to fix this. [~rxin] is this a bug or actually intended behavior (and I'm just not interpreting correctly)? was (Author: fliang): I'm working on a PR to make my use case work. [~rxin] is this a bug or actually intended behavior (and I'm just not interpreting correctly)? > UnsafeRow.getUTF8String should handle off-heap memory > - > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap memory correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
Feynman Liang created SPARK-10352: - Summary: BaseGenericInternalRow.getUTF8String should support java.lang.String Key: SPARK-10352 URL: https://issues.apache.org/jira/browse/SPARK-10352 Project: Spark Issue Type: Bug Components: SQL Reporter: Feynman Liang Running the code: {{ val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) }} generates the error: {{[info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip***}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721289#comment-14721289 ] Feynman Liang commented on SPARK-10352: --- Working on a PR. [~rxin] can you confirm that this is a bug? > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {{ > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > }} > generates the error: > {{[info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip***}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} was: Running the code: {code scala} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {/code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {/code} was: Running the code: {{code}} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {{/code}} generates the error: {{code}} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {{/code}} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {/code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {/code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code scala} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} was: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {/code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {/code} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code scala} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {{code}} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {{/code}} generates the error: {{code}} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {{/code}} was: Running the code: {{ val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) }} generates the error: {{[info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip***}} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {{code}} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {{/code}} > generates the error: > {{code}} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {{/code}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code:scala} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} was: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code:scala} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} was: Running the code: {code:scala} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10352: Assignee: Apache Spark > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Assignee: Apache Spark > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10352: Assignee: (was: Apache Spark) > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721291#comment-14721291 ] Apache Spark commented on SPARK-10352: -- User 'feynmanliang' has created a pull request for this issue: https://github.com/apache/spark/pull/8522 > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} Although `StringType` should in theory only have internal type `UTF8String`, we [are inconsistent with this constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] and being more strict would [break existing code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] was: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} > Although `StringType` should in theory only have internal type `UTF8String`, > we [are inconsistent with this > constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] > and being more strict would [break existing > code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10334) Partitioned table scan's query plan does not show Filter and Project on top of the table scan
[ https://issues.apache.org/jira/browse/SPARK-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10334. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 8515 [https://github.com/apache/spark/pull/8515] > Partitioned table scan's query plan does not show Filter and Project on top > of the table scan > - > > Key: SPARK-10334 > URL: https://issues.apache.org/jira/browse/SPARK-10334 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Critical > Fix For: 1.5.0 > > > {code} > Seq(Tuple2(1, 1), Tuple2(2, 2)).toDF("i", > "j").write.format("parquet").partitionBy("i").save("/tmp/testFilter_partitioned") > val df1 = > sqlContext.read.format("parquet").load("/tmp/testFilter_partitioned") > df1.selectExpr("hash(i)", "hash(j)").show > df1.filter("hash(j) = 1").explain > == Physical Plan == > Scan ParquetRelation[file:/tmp/testFilter_partitioned][j#20,i#21] > {code} > Looks like the reason is that we correctly apply the project and filter. > Then, we create an RDD for the result and then manually create a PhysicalRDD. > So, the Project and Filter on top of the original table scan disappears from > the physical plan. > See > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L138-L175 > We will not generate wrong result. But, the query plan is confusing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10339) When scanning a partitioned table having thousands of partitions, Driver has a very high memory pressure because of SQL metrics
[ https://issues.apache.org/jira/browse/SPARK-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10339. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 8515 [https://github.com/apache/spark/pull/8515] > When scanning a partitioned table having thousands of partitions, Driver has > a very high memory pressure because of SQL metrics > --- > > Key: SPARK-10339 > URL: https://issues.apache.org/jira/browse/SPARK-10339 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Blocker > Fix For: 1.5.0 > > > I have a local dataset having 5000 partitions stored in {{/tmp/partitioned}}. > When I run the following code, the free memory space in driver's old gen > gradually decreases and eventually there is pretty much no free space in > driver's old gen. Finally, all kinds of timeouts happen and the cluster is > died. > {code} > val df = sqlContext.read.format("parquet").load("/tmp/partitioned") > df.filter("a > -100").selectExpr("hash(a, b)").queryExecution.toRdd.foreach(_ > => Unit) > {code} > I did a quick test by deleting SQL metrics from project and filter operator, > my job works fine. > The reason is that for a partitioned table, when we scan it, the actual plan > is like > {code} >other operators >| >| > /--|--\ >/ | \ > /|\ > / | \ > project project ... project > || | > filter filter ... filter > || | > part1part2 ... part n > {code} > We create SQL metrics for every filter and project, which causing the > extremely high memory pressure to the driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} Although {{StringType}} should in theory only have internal type {{UTF8String}}, we [are inconsistent with this constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] and being more strict would [break existing code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] was: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} Although `StringType` should in theory only have internal type `UTF8String`, we [are inconsistent with this constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] and being more strict would [break existing code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} > Although {{StringType}} should in theory only have internal type > {{UTF8String}}, we [are inconsistent with this > constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] > and being more strict would [break existing > code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Description: {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which returns {{null}} when passed a {{null}} base object, failing to handle off-heap backed {{UnsafeRow}}s correctly. This will also cause a {{NullPointerException}} when {{getString}} is called with off-heap storage. was: {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which returns {{null}} when passed a {{null}} base object, failing to handle off-heap memory correctly. This will also cause a {{NullPointerException}} when {{getString}} is called with off-heap storage. > UnsafeRow.getUTF8String should handle off-heap memory > - > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getString should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Summary: UnsafeRow.getString should handle off-heap backed UnsafeRow (was: UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow) > UnsafeRow.getString should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Summary: UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow (was: UnsafeRow.getUTF8String should handle off-heap memory) > UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10351) UnsafeRow.getString should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10351: Assignee: Apache Spark > UnsafeRow.getString should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Assignee: Apache Spark >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10351) UnsafeRow.getString should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10351: Assignee: (was: Apache Spark) > UnsafeRow.getString should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10351) UnsafeRow.getString should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721302#comment-14721302 ] Apache Spark commented on SPARK-10351: -- User 'feynmanliang' has created a pull request for this issue: https://github.com/apache/spark/pull/8523 > UnsafeRow.getString should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) Replace internal usages of String with UTF8String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Summary: Replace internal usages of String with UTF8String (was: BaseGenericInternalRow.getUTF8String should support java.lang.String) > Replace internal usages of String with UTF8String > - > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} > Although {{StringType}} should in theory only have internal type > {{UTF8String}}, we [are inconsistent with this > constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] > and being more strict would [break existing > code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) Replace SQLTestData internal usages of String with UTF8String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Summary: Replace SQLTestData internal usages of String with UTF8String (was: Replace internal usages of String with UTF8String) > Replace SQLTestData internal usages of String with UTF8String > - > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} > Although {{StringType}} should in theory only have internal type > {{UTF8String}}, we [are inconsistent with this > constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] > and being more strict would [break existing > code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail
[ https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-10301: - Target Version/s: 1.6.0 (was: 1.5.0) > For struct type, if parquet's global schema has less fields than a file's > schema, data reading will fail > > > Key: SPARK-10301 > URL: https://issues.apache.org/jira/browse/SPARK-10301 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Critical > > When parquet's global schema has less number of fields than the local schema > of a file, the data reading path will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10334) Partitioned table scan's query plan does not show Filter and Project on top of the table scan
[ https://issues.apache.org/jira/browse/SPARK-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-10334: - Target Version/s: 1.5.0 (was: 1.6.0, 1.5.1) > Partitioned table scan's query plan does not show Filter and Project on top > of the table scan > - > > Key: SPARK-10334 > URL: https://issues.apache.org/jira/browse/SPARK-10334 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Critical > Fix For: 1.5.0 > > > {code} > Seq(Tuple2(1, 1), Tuple2(2, 2)).toDF("i", > "j").write.format("parquet").partitionBy("i").save("/tmp/testFilter_partitioned") > val df1 = > sqlContext.read.format("parquet").load("/tmp/testFilter_partitioned") > df1.selectExpr("hash(i)", "hash(j)").show > df1.filter("hash(j) = 1").explain > == Physical Plan == > Scan ParquetRelation[file:/tmp/testFilter_partitioned][j#20,i#21] > {code} > Looks like the reason is that we correctly apply the project and filter. > Then, we create an RDD for the result and then manually create a PhysicalRDD. > So, the Project and Filter on top of the original table scan disappears from > the physical plan. > See > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L138-L175 > We will not generate wrong result. But, the query plan is confusing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail
[ https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721312#comment-14721312 ] Yin Huai commented on SPARK-10301: -- https://github.com/apache/spark/pull/8515 has been merged. It is not the fix for this issue but will give users a nice error message when the global schema as less struct fields than local parquet file schema (it will ask users to enable schema merging). I am re-targeting this issue to 1.6 for the proper fix (https://github.com/apache/spark/pull/8509). > For struct type, if parquet's global schema has less fields than a file's > schema, data reading will fail > > > Key: SPARK-10301 > URL: https://issues.apache.org/jira/browse/SPARK-10301 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Critical > > When parquet's global schema has less number of fields than the local schema > of a file, the data reading path will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail
[ https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-10301: - Assignee: Cheng Lian (was: Yin Huai) > For struct type, if parquet's global schema has less fields than a file's > schema, data reading will fail > > > Key: SPARK-10301 > URL: https://issues.apache.org/jira/browse/SPARK-10301 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Assignee: Cheng Lian >Priority: Critical > > When parquet's global schema has less number of fields than the local schema > of a file, the data reading path will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9514) Add EventHubsReceiver to support Spark Streaming using Azure EventHubs
[ https://issues.apache.org/jira/browse/SPARK-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-9514: Fix Version/s: (was: 1.5.0) > Add EventHubsReceiver to support Spark Streaming using Azure EventHubs > -- > > Key: SPARK-9514 > URL: https://issues.apache.org/jira/browse/SPARK-9514 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.4.1 >Reporter: shanyu zhao > Attachments: SPARK-9514.patch > > > We need to add EventHubsReceiver implementation to support Spark Streaming > applications that receive data from Azure EventHubs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9976) create function do not work
[ https://issues.apache.org/jira/browse/SPARK-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-9976: Fix Version/s: (was: 1.4.2) (was: 1.5.0) > create function do not work > --- > > Key: SPARK-9976 > URL: https://issues.apache.org/jira/browse/SPARK-9976 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0 > Environment: spark 1.4.1 yarn 2.2.0 >Reporter: cen yuhai > > I use beeline to connect to ThriftServer, but add jar can not work, so I use > create function , see the link below. > http://www.cloudera.com/content/cloudera/en/documentation/core/v5-3-x/topics/cm_mc_hive_udf.html > I do as blow: > {code} > create function gdecodeorder as 'com.hive.udf.GOrderDecode' USING JAR > 'hdfs://mycluster/user/spark/lib/gorderdecode.jar'; > {code} > It returns Ok, and I connect to the metastore, I see records in table FUNCS. > {code} > select gdecodeorder(t1) from tableX limit 1; > {code} > It returns error 'Couldn't find function default.gdecodeorder' > This is the Exception > {code} > 15/08/14 14:53:51 ERROR UserGroupInformation: PriviledgedActionException > as:xiaoju (auth:SIMPLE) cause:org.apache.hive.service.cli.HiveSQLException: > java.lang.RuntimeException: Couldn't find function default.gdecodeorder > 15/08/14 15:04:47 ERROR RetryingHMSHandler: > MetaException(message:NoSuchObjectException(message:Function > default.t_gdecodeorder does not exist)) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:4613) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_function(HiveMetaStore.java:4740) > at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) > at com.sun.proxy.$Proxy21.get_function(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getFunction(HiveMetaStoreClient.java:1721) > at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) > at com.sun.proxy.$Proxy22.getFunction(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:2662) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfoFromMetastore(FunctionRegistry.java:546) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getQualifiedFunctionInfo(FunctionRegistry.java:579) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:645) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:652) > at > org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUdfs.scala:54) > at > org.apache.spark.sql.hive.HiveContext$$anon$3.org$apache$spark$sql$catalyst$analysis$OverrideFunctionRegistry$$super$lookupFunction(HiveContext.scala:376) > at > org.apache.spark.sql.catalyst.analysis.OverrideFunctionRegistry$$anonfun$lookupFunction$2.apply(FunctionRegistry.scala:44) > at > org.apache.spark.sql.catalyst.analysis.OverrideFunctionRegistry$$anonfun$lookupFunction$2.apply(FunctionRegistry.scala:44) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.catalyst.analysis.OverrideFunctionRegistry$class.lookupFunction(FunctionRegistry.scala:44) > at > org.apache.spark.sql.hive.HiveContext$$anon$3.lookupFunction(HiveContext.scala:376) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:465) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:463) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:221) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:242) > at scala.collection.Iterator$$ano
[jira] [Commented] (SPARK-9976) create function do not work
[ https://issues.apache.org/jira/browse/SPARK-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721313#comment-14721313 ] Yin Huai commented on SPARK-9976: - Can you try our 1.5 branch and see if add jar in thrift server works? > create function do not work > --- > > Key: SPARK-9976 > URL: https://issues.apache.org/jira/browse/SPARK-9976 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0 > Environment: spark 1.4.1 yarn 2.2.0 >Reporter: cen yuhai > > I use beeline to connect to ThriftServer, but add jar can not work, so I use > create function , see the link below. > http://www.cloudera.com/content/cloudera/en/documentation/core/v5-3-x/topics/cm_mc_hive_udf.html > I do as blow: > {code} > create function gdecodeorder as 'com.hive.udf.GOrderDecode' USING JAR > 'hdfs://mycluster/user/spark/lib/gorderdecode.jar'; > {code} > It returns Ok, and I connect to the metastore, I see records in table FUNCS. > {code} > select gdecodeorder(t1) from tableX limit 1; > {code} > It returns error 'Couldn't find function default.gdecodeorder' > This is the Exception > {code} > 15/08/14 14:53:51 ERROR UserGroupInformation: PriviledgedActionException > as:xiaoju (auth:SIMPLE) cause:org.apache.hive.service.cli.HiveSQLException: > java.lang.RuntimeException: Couldn't find function default.gdecodeorder > 15/08/14 15:04:47 ERROR RetryingHMSHandler: > MetaException(message:NoSuchObjectException(message:Function > default.t_gdecodeorder does not exist)) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:4613) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_function(HiveMetaStore.java:4740) > at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) > at com.sun.proxy.$Proxy21.get_function(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getFunction(HiveMetaStoreClient.java:1721) > at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) > at com.sun.proxy.$Proxy22.getFunction(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:2662) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfoFromMetastore(FunctionRegistry.java:546) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getQualifiedFunctionInfo(FunctionRegistry.java:579) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:645) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:652) > at > org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUdfs.scala:54) > at > org.apache.spark.sql.hive.HiveContext$$anon$3.org$apache$spark$sql$catalyst$analysis$OverrideFunctionRegistry$$super$lookupFunction(HiveContext.scala:376) > at > org.apache.spark.sql.catalyst.analysis.OverrideFunctionRegistry$$anonfun$lookupFunction$2.apply(FunctionRegistry.scala:44) > at > org.apache.spark.sql.catalyst.analysis.OverrideFunctionRegistry$$anonfun$lookupFunction$2.apply(FunctionRegistry.scala:44) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.catalyst.analysis.OverrideFunctionRegistry$class.lookupFunction(FunctionRegistry.scala:44) > at > org.apache.spark.sql.hive.HiveContext$$anon$3.lookupFunction(HiveContext.scala:376) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:465) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:463) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:221) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNod
[jira] [Updated] (SPARK-10110) StringIndexer lacks of parameter "handleInvalid".
[ https://issues.apache.org/jira/browse/SPARK-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-10110: - Fix Version/s: (was: 1.5.0) > StringIndexer lacks of parameter "handleInvalid". > - > > Key: SPARK-10110 > URL: https://issues.apache.org/jira/browse/SPARK-10110 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Kai Sasaki > Labels: ML > > Missing API for pyspark {{StringIndexer.handleInvalid}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10110) StringIndexer lacks of parameter "handleInvalid".
[ https://issues.apache.org/jira/browse/SPARK-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721314#comment-14721314 ] Yin Huai commented on SPARK-10110: -- I am removing fix version since this field will not be set until the pr gets merged. > StringIndexer lacks of parameter "handleInvalid". > - > > Key: SPARK-10110 > URL: https://issues.apache.org/jira/browse/SPARK-10110 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Kai Sasaki > Labels: ML > > Missing API for pyspark {{StringIndexer.handleInvalid}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-10352) Replace SQLTestData internal usages of String with UTF8String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang closed SPARK-10352. - Resolution: Not A Problem Caused by my code not respecting {{InternalRow}} can only contain {{UTF8String}} and no {{java.lang.String}} > Replace SQLTestData internal usages of String with UTF8String > - > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} > Although {{StringType}} should in theory only have internal type > {{UTF8String}}, we [are inconsistent with this > constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] > and being more strict would [break existing > code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
[ https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721316#comment-14721316 ] Yin Huai commented on SPARK-1564: - https://github.com/apache/spark/pull/7169 has been merged and it is included in both 1.5.0-rc1 and 1.5.0-rc2. I am resolving this issue. > Add JavaScript into Javadoc to turn ::Experimental:: and such into badges > - > > Key: SPARK-1564 > URL: https://issues.apache.org/jira/browse/SPARK-1564 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Matei Zaharia >Assignee: Andrew Or >Priority: Minor > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10351) UnsafeRow.getString should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721318#comment-14721318 ] Reynold Xin commented on SPARK-10351: - getString is only used in debugging I think? > UnsafeRow.getString should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
[ https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-1564. - Resolution: Fixed > Add JavaScript into Javadoc to turn ::Experimental:: and such into badges > - > > Key: SPARK-1564 > URL: https://issues.apache.org/jira/browse/SPARK-1564 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Matei Zaharia >Assignee: Andrew Or >Priority: Minor > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
[ https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721319#comment-14721319 ] Yin Huai commented on SPARK-1564: - [~andrewor14] seems I cannot change the assignee to [~deron]. Can you assign it to him? > Add JavaScript into Javadoc to turn ::Experimental:: and such into badges > - > > Key: SPARK-1564 > URL: https://issues.apache.org/jira/browse/SPARK-1564 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Matei Zaharia >Assignee: Andrew Or >Priority: Minor > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10351) UnsafeRow.getString should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721325#comment-14721325 ] Feynman Liang commented on SPARK-10351: --- Sorry, the fix is for {{getUTF8String}}. {{getString}} is the method which causes the {{NullPointerException}}. Updated title. > UnsafeRow.getString should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Summary: UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow (was: UnsafeRow.getString should handle off-heap backed UnsafeRow) > UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8684) Update R version in Spark EC2 AMI
[ https://issues.apache.org/jira/browse/SPARK-8684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721324#comment-14721324 ] Yin Huai commented on SPARK-8684: - Should we resolve it? > Update R version in Spark EC2 AMI > - > > Key: SPARK-8684 > URL: https://issues.apache.org/jira/browse/SPARK-8684 > Project: Spark > Issue Type: Improvement > Components: EC2, SparkR >Reporter: Shivaram Venkataraman >Priority: Minor > Fix For: 1.5.0 > > > Right now the R version in the AMI is 3.1 -- However a number of R libraries > need R version 3.2 and it will be good to update the R version on the AMI > while launching a EC2 cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9991) Create local limit operator
[ https://issues.apache.org/jira/browse/SPARK-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9991. Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 1.6.0 > Create local limit operator > --- > > Key: SPARK-9991 > URL: https://issues.apache.org/jira/browse/SPARK-9991 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Shixiong Zhu > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9986) Create a simple test framework for local operators
[ https://issues.apache.org/jira/browse/SPARK-9986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9986. Resolution: Fixed Fix Version/s: 1.6.0 > Create a simple test framework for local operators > -- > > Key: SPARK-9986 > URL: https://issues.apache.org/jira/browse/SPARK-9986 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Shixiong Zhu > Fix For: 1.6.0 > > > It'd be great if we can just create local query plans and test the > correctness of their implementation directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9993) Create local union operator
[ https://issues.apache.org/jira/browse/SPARK-9993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9993. Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 1.6.0 > Create local union operator > --- > > Key: SPARK-9993 > URL: https://issues.apache.org/jira/browse/SPARK-9993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Shixiong Zhu > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6817) DataFrame UDFs in R
[ https://issues.apache.org/jira/browse/SPARK-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717079#comment-14717079 ] Reynold Xin edited comment on SPARK-6817 at 8/30/15 1:14 AM: - Here are some suggestions on the proposed API. If the idea is to keep the API close to R's current primitives, we should avoid introducing too many new keywords. E.g., dapplyCollect can be expressed as collect(dapply(...)). Since collect already exists in Spark, and R users are comfortable with the syntax as part of dplyr, we shoud reuse the keyword instead of introducing a new function dapplyCollect. Relying on existing syntax will reduce the learning curve for users. Was performance the primary intent to introduce dapplyCollect instead of collect(dapply(...))? Similarly, can we do away with gapply and gapplyCollect, and express it using dapply? In R, the function "split" provides grouping (https://stat.ethz.ch/R-manual/R-devel/library/base/html/split.html). One should be able to implement "split" using GroupBy in Spark. "gapply" can then be expressed in terms of dapply and split, and gapplyCollect will become collect(dapply(..split..)). Here is a simple example that uses split and lapply in R: {code} df<-data.frame(city=c("A","B","A","D"), age=c(10,12,23,5)) print(df) s<-split(df$age, df$city) lapply(s, mean) {code} was (Author: indrajit): Here are some suggestions on the proposed API. If the idea is to keep the API close to R's current primitives, we should avoid introducing too many new keywords. E.g., dapplyCollect can be expressed as collect(dapply(...)). Since collect already exists in Spark, and R users are comfortable with the syntax as part of dplyr, we shoud reuse the keyword instead of introducing a new function dapplyCollect. Relying on existing syntax will reduce the learning curve for users. Was performance the primary intent to introduce dapplyCollect instead of collect(dapply(...))? Similarly, can we do away with gapply and gapplyCollect, and express it using dapply? In R, the function "split" provides grouping (https://stat.ethz.ch/R-manual/R-devel/library/base/html/split.html). One should be able to implement "split" using GroupBy in Spark. "gapply" can then be expressed in terms of dapply and split, and gapplyCollect will become collect(dapply(..split..)). Here is a simple example that uses split and lapply in R: df<-data.frame(city=c("A","B","A","D"), age=c(10,12,23,5)) print(df) s<-split(df$age, df$city) lapply(s, mean) > DataFrame UDFs in R > --- > > Key: SPARK-6817 > URL: https://issues.apache.org/jira/browse/SPARK-6817 > Project: Spark > Issue Type: New Feature > Components: SparkR, SQL >Reporter: Shivaram Venkataraman > > This depends on some internal interface of Spark SQL, should be done after > merging into Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9078) Use of non-standard LIMIT keyword in JDBC tableExists code
[ https://issues.apache.org/jira/browse/SPARK-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721328#comment-14721328 ] Reynold Xin commented on SPARK-9078: Please submit a pull request, [~tsuresh]. I think it is OK to ignore the option for now. > Use of non-standard LIMIT keyword in JDBC tableExists code > -- > > Key: SPARK-9078 > URL: https://issues.apache.org/jira/browse/SPARK-9078 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1, 1.4.0 >Reporter: Robert Beauchemin >Priority: Minor > > tableExists in > spark/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcUtils.scala uses > non-standard SQL (specifically, the LIMIT keyword) to determine whether a > table exists in a JDBC data source. This will cause an exception in many/most > JDBC databases that doesn't support LIMIT keyword. See > http://stackoverflow.com/questions/1528604/how-universal-is-the-limit-statement-in-sql > To check for table existence or an exception, it could be recrafted around > "select 1 from $table where 0 = 1" which isn't the same (it returns an empty > resultset rather than the value '1'), but would support more data sources and > also support empty tables. Arguably ugly and possibly queries every row on > sources that don't support constant folding, but better than failing on JDBC > sources that don't support LIMIT. > Perhaps "supports LIMIT" could be a field in the JdbcDialect class for > databases that support keyword this to override. The ANSI standard is (OFFSET > and) FETCH. > The standard way to check for table existence would be to use > information_schema.tables which is a SQL standard but may not work for other > JDBC data sources that support SQL, but not the information_schema. The JDBC > DatabaseMetaData interface provides getSchemas() that allows checking for > the information_schema in drivers that support it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10308) %in% is not exported in SparkR
[ https://issues.apache.org/jira/browse/SPARK-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10308: Fix Version/s: (was: 1.5.1) (was: 1.6.0) 1.5.0 > %in% is not exported in SparkR > -- > > Key: SPARK-10308 > URL: https://issues.apache.org/jira/browse/SPARK-10308 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Shivaram Venkataraman >Assignee: Shivaram Venkataraman > Fix For: 1.5.0 > > > While the operator is defined in Column.R it is not exported in our NAMESPACE > file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10287) After processing a query using JSON data, Spark SQL continuously refreshes metadata of the table
[ https://issues.apache.org/jira/browse/SPARK-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10287: Fix Version/s: (was: 1.5.1) 1.5.0 > After processing a query using JSON data, Spark SQL continuously refreshes > metadata of the table > > > Key: SPARK-10287 > URL: https://issues.apache.org/jira/browse/SPARK-10287 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Critical > Labels: releasenotes > Fix For: 1.5.0 > > > I have a partitioned json table with 1824 partitions. > {code} > val df = sqlContext.read.format("json").load("aPartitionedJsonData") > val columnStr = df.schema.map(_.name).mkString(",") > println(s"columns: $columnStr") > val hash = df > .selectExpr(s"hash($columnStr) as hashValue") > .groupBy() > .sum("hashValue") > .head() > .getLong(0) > {code} > Looks like for JSON, we refresh metadata when we call buildScan. For a > partitioned table, we call buildScan for every partition. So, looks like we > will refresh this table 1824 times. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10188) Pyspark CrossValidator with RMSE selects incorrect model
[ https://issues.apache.org/jira/browse/SPARK-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10188: Fix Version/s: (was: 1.5.1) 1.5.0 > Pyspark CrossValidator with RMSE selects incorrect model > > > Key: SPARK-10188 > URL: https://issues.apache.org/jira/browse/SPARK-10188 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.5.0 >Reporter: Noel Smith >Assignee: Noel Smith >Priority: Critical > Fix For: 1.5.0 > > > Pyspark {{CrossValidator}} is giving incorrect results when selecting > estimators using RMSE as an evaluation metric. > In the example below, it should be selecting the {{LogisticRegression}} > estimator with zero regularization as that gives the most accurate result, > but instead it selects the one with the largest. > Probably related to: SPARK-10097 > {code} > from pyspark.ml.evaluation import RegressionEvaluator > from pyspark.ml.regression import LinearRegression > from pyspark.ml.tuning import ParamGridBuilder, CrossValidator, > CrossValidatorModel > from pyspark.ml.feature import Binarizer > from pyspark.mllib.linalg import Vectors > from pyspark.sql import SQLContext > sqlContext = SQLContext(sc) > # Label = 2 * feature > train = sqlContext.createDataFrame([ > (Vectors.dense([10.0]), 20.0), > (Vectors.dense([100.0]), 200.0), > (Vectors.dense([1000.0]), 2000.0)] * 10, > ["features", "label"]) > test = sqlContext.createDataFrame([ > (Vectors.dense([1000.0]),)], > ["features"]) > # Expected prediction 2000.0 > print LinearRegression(regParam=0.0).fit(train).transform(test).collect() # > Predicts 2000.0 (perfect) > print LinearRegression(regParam=100.0).fit(train).transform(test).collect() # > Predicts 1869.31 > print > LinearRegression(regParam=100.0).fit(train).transform(test).collect() # > 741.08 (worst) > # Cross-validation > lr = LinearRegression() > rmse_eval = RegressionEvaluator(metricName="rmse") > grid = (ParamGridBuilder() > .addGrid( lr.regParam, [0.0, 100.0, 100.0] ) > .build()) > cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, > evaluator=rmse_eval) > cv_model = cv.fit(train) > cv_model.bestModel.transform(test).collect() # Predicts 741.08 (i.e. worst > model selected) > {code} > Once workaround for users would be to add a wrapper around the selected > evaluator to invert the metric: > {code} > class InvertedEvaluator(Evaluator): > def __init__(self, evaluator): > super(Evaluator, self).__init__() > self.evaluator = evaluator > > def _evaluate(self, dataset): > return -self.evaluator.evaluate(dataset) > invertedEvaluator = InvertedEvaluator(RegressionEvaluator(metricName="rmse")) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9671) ML 1.5 QA: Programming guide update and migration guide
[ https://issues.apache.org/jira/browse/SPARK-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9671: --- Fix Version/s: (was: 1.5.1) 1.5.0 > ML 1.5 QA: Programming guide update and migration guide > --- > > Key: SPARK-9671 > URL: https://issues.apache.org/jira/browse/SPARK-9671 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Joseph K. Bradley >Assignee: Xiangrui Meng >Priority: Critical > Fix For: 1.5.0 > > > Before the release, we need to update the MLlib Programming Guide. Updates > will include: > * Add migration guide subsection. > ** Use the results of the QA audit JIRAs. > * Check phrasing, especially in main sections (for outdated items such as "In > this release, ...") > * Possibly reorganize parts of the Pipelines guide if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10188) Pyspark CrossValidator with RMSE selects incorrect model
[ https://issues.apache.org/jira/browse/SPARK-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10188: Target Version/s: 1.5.0 (was: 1.5.1) > Pyspark CrossValidator with RMSE selects incorrect model > > > Key: SPARK-10188 > URL: https://issues.apache.org/jira/browse/SPARK-10188 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.5.0 >Reporter: Noel Smith >Assignee: Noel Smith >Priority: Critical > Fix For: 1.5.0 > > > Pyspark {{CrossValidator}} is giving incorrect results when selecting > estimators using RMSE as an evaluation metric. > In the example below, it should be selecting the {{LogisticRegression}} > estimator with zero regularization as that gives the most accurate result, > but instead it selects the one with the largest. > Probably related to: SPARK-10097 > {code} > from pyspark.ml.evaluation import RegressionEvaluator > from pyspark.ml.regression import LinearRegression > from pyspark.ml.tuning import ParamGridBuilder, CrossValidator, > CrossValidatorModel > from pyspark.ml.feature import Binarizer > from pyspark.mllib.linalg import Vectors > from pyspark.sql import SQLContext > sqlContext = SQLContext(sc) > # Label = 2 * feature > train = sqlContext.createDataFrame([ > (Vectors.dense([10.0]), 20.0), > (Vectors.dense([100.0]), 200.0), > (Vectors.dense([1000.0]), 2000.0)] * 10, > ["features", "label"]) > test = sqlContext.createDataFrame([ > (Vectors.dense([1000.0]),)], > ["features"]) > # Expected prediction 2000.0 > print LinearRegression(regParam=0.0).fit(train).transform(test).collect() # > Predicts 2000.0 (perfect) > print LinearRegression(regParam=100.0).fit(train).transform(test).collect() # > Predicts 1869.31 > print > LinearRegression(regParam=100.0).fit(train).transform(test).collect() # > 741.08 (worst) > # Cross-validation > lr = LinearRegression() > rmse_eval = RegressionEvaluator(metricName="rmse") > grid = (ParamGridBuilder() > .addGrid( lr.regParam, [0.0, 100.0, 100.0] ) > .build()) > cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, > evaluator=rmse_eval) > cv_model = cv.fit(train) > cv_model.bestModel.transform(test).collect() # Predicts 741.08 (i.e. worst > model selected) > {code} > Once workaround for users would be to add a wrapper around the selected > evaluator to invert the metric: > {code} > class InvertedEvaluator(Evaluator): > def __init__(self, evaluator): > super(Evaluator, self).__init__() > self.evaluator = evaluator > > def _evaluate(self, dataset): > return -self.evaluator.evaluate(dataset) > invertedEvaluator = InvertedEvaluator(RegressionEvaluator(metricName="rmse")) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10219) Error when additional options provided as variable in write.df
[ https://issues.apache.org/jira/browse/SPARK-10219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10219: Fix Version/s: (was: 1.5.1) (was: 1.6.0) 1.5.0 > Error when additional options provided as variable in write.df > -- > > Key: SPARK-10219 > URL: https://issues.apache.org/jira/browse/SPARK-10219 > Project: Spark > Issue Type: Bug > Components: R >Affects Versions: 1.4.0 > Environment: SparkR shell >Reporter: Samuel Alexander >Assignee: Shivaram Venkataraman > Labels: spark-shell, sparkR > Fix For: 1.5.0 > > > Opened a SparkR shell > Created a df using > > df <- jsonFile(sqlContext, "examples/src/main/resources/people.json") > Assigned a variable like below > > mode <- "append" > When write.df called using below statement got the mentioned error > > write.df(df, source="org.apache.spark.sql.parquet", path=par_path, > > option=mode) > Error in writeType(con, type) : Unsupported type for serialization name > Whereas mode is passed as "append" itself, i.e. not via mode variable as > below everything works fine > > write.df(df, source="org.apache.spark.sql.parquet", path=par_path, > > option="append") > Note: For parquet it is not needed to hanve option. But we are using Spark > Salesforce package > (http://spark-packages.org/package/springml/spark-salesforce) which require > additional options to be passed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10336) fitIntercept is a command line option but not set in the LR example program.
[ https://issues.apache.org/jira/browse/SPARK-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10336: Fix Version/s: (was: 1.5.1) 1.5.0 > fitIntercept is a command line option but not set in the LR example program. > > > Key: SPARK-10336 > URL: https://issues.apache.org/jira/browse/SPARK-10336 > Project: Spark > Issue Type: Bug > Components: Documentation, ML >Affects Versions: 1.4.1, 1.5.0 >Reporter: Shuo Xiang >Assignee: Shuo Xiang > Fix For: 1.5.0 > > > the parsed parameter is not set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10328) na.omit has too restrictive generic in SparkR
[ https://issues.apache.org/jira/browse/SPARK-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10328: Fix Version/s: (was: 1.5.1) (was: 1.6.0) 1.5.0 > na.omit has too restrictive generic in SparkR > - > > Key: SPARK-10328 > URL: https://issues.apache.org/jira/browse/SPARK-10328 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shivaram Venkataraman >Assignee: Shivaram Venkataraman > Fix For: 1.5.0 > > > It should match the S3 function definition -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10295) Dynamic allocation in Mesos does not release when RDDs are cached
[ https://issues.apache.org/jira/browse/SPARK-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10295: Fix Version/s: (was: 1.5.1) (was: 1.6.0) 1.5.0 > Dynamic allocation in Mesos does not release when RDDs are cached > - > > Key: SPARK-10295 > URL: https://issues.apache.org/jira/browse/SPARK-10295 > Project: Spark > Issue Type: Improvement > Components: Documentation, Spark Core >Affects Versions: 1.5.0 > Environment: Spark 1.5.0 RC1 > Centos 6 > java 7 oracle >Reporter: Hans van den Bogert >Assignee: Sean Owen >Priority: Minor > Fix For: 1.5.0 > > > When running spark in coarse grained mode with shuffle service and dynamic > allocation, the driver does not release executors if a dataset is cached. > The console output OTOH shows: > > 15/08/26 17:29:58 WARN SparkContext: Dynamic allocation currently does not > > support cached RDDs. Cached data for RDD 9 will be lost when executors are > > removed. > However after the default of 1m, executors are not released. When I perform > the same initial setup, loading data, etc, but without caching, the executors > are released. > Is this intended behaviour? > If this is intended behaviour, the console warning is misleading. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9316) Add support for filtering using `[` (synonym for filter / select)
[ https://issues.apache.org/jira/browse/SPARK-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9316: --- Fix Version/s: (was: 1.5.1) (was: 1.6.0) 1.5.0 > Add support for filtering using `[` (synonym for filter / select) > - > > Key: SPARK-9316 > URL: https://issues.apache.org/jira/browse/SPARK-9316 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman >Assignee: Felix Cheung > Fix For: 1.5.0 > > > Will help us support queries of the form > {code} > air[air$UniqueCarrier %in% c("UA", "HA"), c(1,2,3,5:9)] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8952) JsonFile() of SQLContext display improper warning message for a S3 path
[ https://issues.apache.org/jira/browse/SPARK-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8952: --- Fix Version/s: (was: 1.5.1) (was: 1.6.0) 1.5.0 > JsonFile() of SQLContext display improper warning message for a S3 path > --- > > Key: SPARK-8952 > URL: https://issues.apache.org/jira/browse/SPARK-8952 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.4.0 >Reporter: Sun Rui >Assignee: Luciano Resende > Fix For: 1.5.0 > > > This is an issue reported by Ben Spark . > {quote} > Spark 1.4 deployed on AWS EMR > "jsonFile" is working though with some warning message > Warning message: > In normalizePath(path) : > > path[1]="s3://rea-consumer-data-dev/cbr/profiler/output/20150618/part-0": > No such file or directory > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9890) User guide for CountVectorizer
[ https://issues.apache.org/jira/browse/SPARK-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9890: --- Fix Version/s: (was: 1.5.1) 1.5.0 > User guide for CountVectorizer > -- > > Key: SPARK-9890 > URL: https://issues.apache.org/jira/browse/SPARK-9890 > Project: Spark > Issue Type: Documentation > Components: ML >Reporter: Feynman Liang >Assignee: yuhao yang > Fix For: 1.5.0 > > > SPARK-8703 added a count vectorizer as a ML transformer. We should add an > accompanying user guide to {{ml-features}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10315) remove document on spark.akka.failure-detector.threshold
[ https://issues.apache.org/jira/browse/SPARK-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10315: Fix Version/s: (was: 1.5.1) (was: 1.6.0) 1.5.0 > remove document on spark.akka.failure-detector.threshold > > > Key: SPARK-10315 > URL: https://issues.apache.org/jira/browse/SPARK-10315 > Project: Spark > Issue Type: Bug > Components: Documentation >Reporter: Nan Zhu >Assignee: Nan Zhu >Priority: Minor > Fix For: 1.5.0 > > > this parameter is not used any longer and there is some mistake in the > current document , should be 'akka.remote.watch-failure-detector.threshold' -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10304) Partition discovery does not throw an exception if the dir structure is valid
[ https://issues.apache.org/jira/browse/SPARK-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-10304: - Target Version/s: 1.5.1,1.6.0 (was: 1.5.0) > Partition discovery does not throw an exception if the dir structure is valid > - > > Key: SPARK-10304 > URL: https://issues.apache.org/jira/browse/SPARK-10304 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Assignee: Zhan Zhang >Priority: Critical > > I have a dir structure like {{/path/table1/partition_column=1/}}. When I try > to use {{load("/path/")}}, it works and I get a DF. When I query this DF, if > it is stored as ORC, there will be the following NPE. But, if it is Parquet, > we even can return rows. We should complain to users about the dir struct > because {{table1}} does not meet our format. > {code} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 26 in > stage 57.0 failed 4 times, most recent failure: Lost task 26.3 in stage 57.0 > (TID 3504, 10.0.195.227): java.lang.NullPointerException > at > org.apache.spark.sql.hive.HiveInspectors$class.unwrapperFor(HiveInspectors.scala:466) > at > org.apache.spark.sql.hive.orc.OrcTableScan.unwrapperFor(OrcRelation.scala:224) > at > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$org$apache$spark$sql$hive$orc$OrcTableScan$$fillObject$1$$anonfun$9.apply(OrcRelation.scala:261) > at > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$org$apache$spark$sql$hive$orc$OrcTableScan$$fillObject$1$$anonfun$9.apply(OrcRelation.scala:261) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$org$apache$spark$sql$hive$orc$OrcTableScan$$fillObject$1.apply(OrcRelation.scala:261) > at > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$org$apache$spark$sql$hive$orc$OrcTableScan$$fillObject$1.apply(OrcRelation.scala:256) > at scala.Option.map(Option.scala:145) > at > org.apache.spark.sql.hive.orc.OrcTableScan.org$apache$spark$sql$hive$orc$OrcTableScan$$fillObject(OrcRelation.scala:256) > at > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:318) > at > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:316) > at > org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.compute(HadoopRDD.scala:380) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9523) Receiver for Spark Streaming does not naturally support kryo serializer
[ https://issues.apache.org/jira/browse/SPARK-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721354#comment-14721354 ] John Chen commented on SPARK-9523: -- The problem here is not about warn/error/etc, it's that if you want to do something special for transient attributes, you'll have to write your own code. However, for Java and Kryo serializations, the code you write is different. For Java, you need to write those code in readObject() and writeObject() methods, as for Kryo, you'll have to write those codes in another pair of methods: read() and write(). So if you want to support both Java and Kryo serializations with transient attributes and customized serialization operations, you need to write all 4 methods in your class. For other DStream functions, you do not care about Kryo, as they seems to only support Java serialization, so even if you set the KryoSerializer in SparkConf, the serialization is still done by java . However, for the Receiver in Spark Streaming, if will be serialized by Kryo if you set so, and the real issue here is that THE RECEIVER AND OTHER FUNCTIONS DO NOT ACT THE SAME, which can be confusing for new developers. > Receiver for Spark Streaming does not naturally support kryo serializer > --- > > Key: SPARK-9523 > URL: https://issues.apache.org/jira/browse/SPARK-9523 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.3.1 > Environment: Windows 7 local mode >Reporter: John Chen >Priority: Minor > Labels: kryo, serialization > Original Estimate: 120h > Remaining Estimate: 120h > > In some cases, some attributes in a class is not serializable, which you > still want to use after serialization of the whole object, you'll have to > customize your serialization codes. For example, you can declare those > attributes as transient, which makes them ignored during serialization, and > then you can reassign their values during deserialization. > Now, if you're using Java serialization, you'll have to implement > Serializable, and write those codes in readObject() and writeObejct() > methods; And if you're using kryo serialization, you'll have to implement > KryoSerializable, and write these codes in read() and write() methods. > In Spark and Spark Streaming, you can set kryo as the serializer for speeding > up. However, the functions taken by RDD or DStream operations are still > serialized by Java serialization, which means you only need to write those > custom serialization codes in readObject() and writeObejct() methods. > But when it comes to Spark Streaming's Receiver, things are different. When > you wish to customize an InputDStream, you must extend the Receiver. However, > it turns out, the Receiver will be serialized by kryo if you set kryo > serializer in SparkConf, and will fall back to Java serialization if you > didn't. > So here's comes the problems, if you want to change the serializer by > configuration and make sure the Receiver runs perfectly for both Java and > kryo, you'll have to write all the 4 methods above. First, it is redundant, > since you'll have to write serialization/deserialization code almost twice; > Secondly, there's nothing in the doc or in the code to inform users to > implement the KryoSerializable interface. > Since all other function parameters are serialized by Java only, I suggest > you also make it so for the Receiver. It may be slower, but since the > serialization will only be executed for each interval, it's durable. More > importantly, it can cause fewer trouble -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
[ https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-1564: - Assignee: Deron Eriksson (was: Andrew Or) > Add JavaScript into Javadoc to turn ::Experimental:: and such into badges > - > > Key: SPARK-1564 > URL: https://issues.apache.org/jira/browse/SPARK-1564 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Matei Zaharia >Assignee: Deron Eriksson >Priority: Minor > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10353) MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose matrix multiplication
Burak Yavuz created SPARK-10353: --- Summary: MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose matrix multiplication Key: SPARK-10353 URL: https://issues.apache.org/jira/browse/SPARK-10353 Project: Spark Issue Type: Bug Components: MLlib Reporter: Burak Yavuz Basically {code} if (beta != 0.0) { f2jBLAS.dscal(C.values.length, beta, C.values, 1) } {code} should be {code} if (beta != 1.0) { f2jBLAS.dscal(C.values.length, beta, C.values, 1) } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10353) MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose matrix multiplication
[ https://issues.apache.org/jira/browse/SPARK-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-10353: Affects Version/s: 1.5.0 > MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose > matrix multiplication > -- > > Key: SPARK-10353 > URL: https://issues.apache.org/jira/browse/SPARK-10353 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Burak Yavuz > > Basically > {code} > if (beta != 0.0) { > f2jBLAS.dscal(C.values.length, beta, C.values, 1) > } > {code} > should be > {code} > if (beta != 1.0) { > f2jBLAS.dscal(C.values.length, beta, C.values, 1) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10353) MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose matrix multiplication
[ https://issues.apache.org/jira/browse/SPARK-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10353: Assignee: Apache Spark > MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose > matrix multiplication > -- > > Key: SPARK-10353 > URL: https://issues.apache.org/jira/browse/SPARK-10353 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Burak Yavuz >Assignee: Apache Spark > > Basically > {code} > if (beta != 0.0) { > f2jBLAS.dscal(C.values.length, beta, C.values, 1) > } > {code} > should be > {code} > if (beta != 1.0) { > f2jBLAS.dscal(C.values.length, beta, C.values, 1) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10353) MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose matrix multiplication
[ https://issues.apache.org/jira/browse/SPARK-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10353: Assignee: (was: Apache Spark) > MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose > matrix multiplication > -- > > Key: SPARK-10353 > URL: https://issues.apache.org/jira/browse/SPARK-10353 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Burak Yavuz > > Basically > {code} > if (beta != 0.0) { > f2jBLAS.dscal(C.values.length, beta, C.values, 1) > } > {code} > should be > {code} > if (beta != 1.0) { > f2jBLAS.dscal(C.values.length, beta, C.values, 1) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10353) MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose matrix multiplication
[ https://issues.apache.org/jira/browse/SPARK-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721392#comment-14721392 ] Apache Spark commented on SPARK-10353: -- User 'brkyvz' has created a pull request for this issue: https://github.com/apache/spark/pull/8525 > MLlib BLAS gemm outputs wrong result when beta = 0.0 for transpose transpose > matrix multiplication > -- > > Key: SPARK-10353 > URL: https://issues.apache.org/jira/browse/SPARK-10353 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Burak Yavuz > > Basically > {code} > if (beta != 0.0) { > f2jBLAS.dscal(C.values.length, beta, C.values, 1) > } > {code} > should be > {code} > if (beta != 1.0) { > f2jBLAS.dscal(C.values.length, beta, C.values, 1) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10348) Improve Spark ML user guide
[ https://issues.apache.org/jira/browse/SPARK-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-10348. --- Resolution: Fixed Fix Version/s: 1.5.1 Issue resolved by pull request 8517 [https://github.com/apache/spark/pull/8517] > Improve Spark ML user guide > --- > > Key: SPARK-10348 > URL: https://issues.apache.org/jira/browse/SPARK-10348 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Fix For: 1.5.1 > > > improve ml-guide: > * replace `ML Dataset` by `DataFrame` to simplify the abstraction > * remove links to Scala API doc in the main guide > * change ML algorithms to pipeline components -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org