[jira] [Updated] (SPARK-20723) Random Forest Classifier should expose intermediateRDDStorageLevel similar to ALS
[ https://issues.apache.org/jira/browse/SPARK-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] madhukara phatak updated SPARK-20723: - Description: Currently Random Forest implementation cache as the intermediatery data using *MEMORY_AND_DISK* storage level. This creates issues in low memory scenarios. So we should expose an expert param *intermediateStorageLevel* which allows user to customise the storage level. This is similar to als options like specified in below jira https://issues.apache.org/jira/browse/SPARK-14412 was: Currently Random Forest implementation cache as the intermediatery data using *MEMORY_AND_DISK* storage level. This creates issues in low memory scenarios. So we should expose an expert param *intermediateRDDStorageLevel* which allows user to customise the storage level. This is similar to als options like specified in below jira https://issues.apache.org/jira/browse/SPARK-14412 > Random Forest Classifier should expose intermediateRDDStorageLevel similar to > ALS > - > > Key: SPARK-20723 > URL: https://issues.apache.org/jira/browse/SPARK-20723 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.3.0 >Reporter: madhukara phatak >Priority: Minor > > Currently Random Forest implementation cache as the intermediatery data using > *MEMORY_AND_DISK* storage level. This creates issues in low memory scenarios. > So we should expose an expert param *intermediateStorageLevel* which allows > user to customise the storage level. This is similar to als options like > specified in below jira > https://issues.apache.org/jira/browse/SPARK-14412 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20723) Random Forest Classifier should expose intermediateRDDStorageLevel similar to ALS
madhukara phatak created SPARK-20723: Summary: Random Forest Classifier should expose intermediateRDDStorageLevel similar to ALS Key: SPARK-20723 URL: https://issues.apache.org/jira/browse/SPARK-20723 Project: Spark Issue Type: New Feature Components: ML Affects Versions: 2.3.0 Reporter: madhukara phatak Priority: Minor Currently Random Forest implementation cache as the intermediatery data using *MEMORY_AND_DISK* storage level. This creates issues in low memory scenarios. So we should expose an expert param *intermediateRDDStorageLevel* which allows user to customise the storage level. This is similar to als options like specified in below jira https://issues.apache.org/jira/browse/SPARK-14412 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7084) Improve the saveAsTable documentation
[ https://issues.apache.org/jira/browse/SPARK-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] madhukara phatak updated SPARK-7084: Description: The documentation of saveTable is little bit confusing. By name of the API it sounds like it creates a hive table which can be accessed from hive. But it's not the case as discussed here [https://www.mailarchive.com/u...@spark.apache.org/msg26902.html] . This issue is to improve the documentation to reflect the same. (was: The documentation of saveTable is little bit confusing. By name of the API it sounds like it creates a hive table which can be accessed from hive. But it's not the case as discussed here[https://www.mailarchive.com/u...@spark.apache.org/msg26902.html] . This issue is to improve the documentation to reflect the same.) Improve the saveAsTable documentation - Key: SPARK-7084 URL: https://issues.apache.org/jira/browse/SPARK-7084 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 1.3.1 Reporter: madhukara phatak Priority: Minor The documentation of saveTable is little bit confusing. By name of the API it sounds like it creates a hive table which can be accessed from hive. But it's not the case as discussed here [https://www.mailarchive.com/u...@spark.apache.org/msg26902.html] . This issue is to improve the documentation to reflect the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7084) Improve the saveAsTable documentation
madhukara phatak created SPARK-7084: --- Summary: Improve the saveAsTable documentation Key: SPARK-7084 URL: https://issues.apache.org/jira/browse/SPARK-7084 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 1.3.1 Reporter: madhukara phatak Priority: Minor The documentation of saveTable is little bit confusing. By name of the API it sounds like it creates a hive table which can be accessed from hive. But it's not the case as discussed here[https://www.mailarchive.com/u...@spark.apache.org/msg26902.html] . This issue is to improve the documentation to reflect the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4414) SparkContext.wholeTextFiles Doesn't work with S3 Buckets
[ https://issues.apache.org/jira/browse/SPARK-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362856#comment-14362856 ] madhukara phatak commented on SPARK-4414: - Hi, Just ran your example on my local machine. Here is the gist https://gist.github.com/phatak-dev/e75d5d0d773b857903c1. It works fine for me. Can you test the same? SparkContext.wholeTextFiles Doesn't work with S3 Buckets Key: SPARK-4414 URL: https://issues.apache.org/jira/browse/SPARK-4414 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0, 1.2.0 Reporter: Pedro Rodriguez Priority: Critical SparkContext.wholeTextFiles does not read files which SparkContext.textFile can read. Below are general steps to reproduce, my specific case is following that on a git repo. Steps to reproduce. 1. Create Amazon S3 bucket, make public with multiple files 2. Attempt to read bucket with sc.wholeTextFiles(s3n://mybucket/myfile.txt) 3. Spark returns the following error, even if the file exists. Exception in thread main java.io.FileNotFoundException: File does not exist: /myfile.txt at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:517) at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.init(CombineFileInputFormat.java:489) 4. Change the call to sc.textFile(s3n://mybucket/myfile.txt) and there is no error message, the application should run fine. There is a question on StackOverflow as well on this: http://stackoverflow.com/questions/26258458/sparkcontext-wholetextfiles-java-io-filenotfoundexception-file-does-not-exist This is link to repo/lines of code. The uncommented call doesn't work, the commented call works as expected: https://github.com/EntilZha/nips-lda-spark/blob/45f5ad1e2646609ef9d295a0954fbefe84111d8a/src/main/scala/NipsLda.scala#L13-L19 It would be easy to use textFile with a multifile argument, but this should work correctly for s3 bucket files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org