[jira] [Updated] (SPARK-20723) Random Forest Classifier should expose intermediateRDDStorageLevel similar to ALS

2017-05-13 Thread madhukara phatak (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

madhukara phatak updated SPARK-20723:
-
Description: 
Currently Random Forest implementation cache as the intermediatery data using 
*MEMORY_AND_DISK* storage level. This creates issues in low memory scenarios. 
So we should expose an expert param *intermediateStorageLevel* which allows 
user to customise the storage level. This is similar to als options like 
specified in below jira

https://issues.apache.org/jira/browse/SPARK-14412

  was:
Currently Random Forest implementation cache as the intermediatery data using 
*MEMORY_AND_DISK* storage level. This creates issues in low memory scenarios. 
So we should expose an expert param *intermediateRDDStorageLevel* which allows 
user to customise the storage level. This is similar to als options like 
specified in below jira

https://issues.apache.org/jira/browse/SPARK-14412


> Random Forest Classifier should expose intermediateRDDStorageLevel similar to 
> ALS
> -
>
> Key: SPARK-20723
> URL: https://issues.apache.org/jira/browse/SPARK-20723
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: madhukara phatak
>Priority: Minor
>
> Currently Random Forest implementation cache as the intermediatery data using 
> *MEMORY_AND_DISK* storage level. This creates issues in low memory scenarios. 
> So we should expose an expert param *intermediateStorageLevel* which allows 
> user to customise the storage level. This is similar to als options like 
> specified in below jira
> https://issues.apache.org/jira/browse/SPARK-14412



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20723) Random Forest Classifier should expose intermediateRDDStorageLevel similar to ALS

2017-05-12 Thread madhukara phatak (JIRA)
madhukara phatak created SPARK-20723:


 Summary: Random Forest Classifier should expose 
intermediateRDDStorageLevel similar to ALS
 Key: SPARK-20723
 URL: https://issues.apache.org/jira/browse/SPARK-20723
 Project: Spark
  Issue Type: New Feature
  Components: ML
Affects Versions: 2.3.0
Reporter: madhukara phatak
Priority: Minor


Currently Random Forest implementation cache as the intermediatery data using 
*MEMORY_AND_DISK* storage level. This creates issues in low memory scenarios. 
So we should expose an expert param *intermediateRDDStorageLevel* which allows 
user to customise the storage level. This is similar to als options like 
specified in below jira

https://issues.apache.org/jira/browse/SPARK-14412



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7084) Improve the saveAsTable documentation

2015-04-23 Thread madhukara phatak (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

madhukara phatak updated SPARK-7084:

Description: The documentation of saveTable is little bit confusing. By 
name of the API it sounds like it creates a hive table which can be accessed 
from hive. But it's not the case as discussed here 
[https://www.mailarchive.com/u...@spark.apache.org/msg26902.html] . This issue 
is to improve the documentation to reflect the same.  (was: The documentation 
of saveTable is little bit confusing. By name of the API it sounds like it 
creates a hive table which can be accessed from hive. But it's not the case as 
discussed here[https://www.mailarchive.com/u...@spark.apache.org/msg26902.html] 
. This issue is to improve the documentation to reflect the same.)

 Improve the saveAsTable documentation
 -

 Key: SPARK-7084
 URL: https://issues.apache.org/jira/browse/SPARK-7084
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 1.3.1
Reporter: madhukara phatak
Priority: Minor

 The documentation of saveTable is little bit confusing. By name of the API it 
 sounds like it creates a hive table which can be accessed from hive. But it's 
 not the case as discussed here 
 [https://www.mailarchive.com/u...@spark.apache.org/msg26902.html] . This 
 issue is to improve the documentation to reflect the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7084) Improve the saveAsTable documentation

2015-04-23 Thread madhukara phatak (JIRA)
madhukara phatak created SPARK-7084:
---

 Summary: Improve the saveAsTable documentation
 Key: SPARK-7084
 URL: https://issues.apache.org/jira/browse/SPARK-7084
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 1.3.1
Reporter: madhukara phatak
Priority: Minor


The documentation of saveTable is little bit confusing. By name of the API it 
sounds like it creates a hive table which can be accessed from hive. But it's 
not the case as discussed 
here[https://www.mailarchive.com/u...@spark.apache.org/msg26902.html] . This 
issue is to improve the documentation to reflect the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4414) SparkContext.wholeTextFiles Doesn't work with S3 Buckets

2015-03-16 Thread madhukara phatak (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362856#comment-14362856
 ] 

madhukara phatak commented on SPARK-4414:
-

Hi,
 Just ran your example on my local machine. Here is the gist 
https://gist.github.com/phatak-dev/e75d5d0d773b857903c1. It works fine for me.  
Can you test the same?

 SparkContext.wholeTextFiles Doesn't work with S3 Buckets
 

 Key: SPARK-4414
 URL: https://issues.apache.org/jira/browse/SPARK-4414
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: Pedro Rodriguez
Priority: Critical

 SparkContext.wholeTextFiles does not read files which SparkContext.textFile 
 can read. Below are general steps to reproduce, my specific case is following 
 that on a git repo.
 Steps to reproduce.
 1. Create Amazon S3 bucket, make public with multiple files
 2. Attempt to read bucket with
 sc.wholeTextFiles(s3n://mybucket/myfile.txt)
 3. Spark returns the following error, even if the file exists.
 Exception in thread main java.io.FileNotFoundException: File does not 
 exist: /myfile.txt
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:517)
   at 
 org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.init(CombineFileInputFormat.java:489)
 4. Change the call to
 sc.textFile(s3n://mybucket/myfile.txt)
 and there is no error message, the application should run fine.
 There is a question on StackOverflow as well on this:
 http://stackoverflow.com/questions/26258458/sparkcontext-wholetextfiles-java-io-filenotfoundexception-file-does-not-exist
 This is link to repo/lines of code. The uncommented call doesn't work, the 
 commented call works as expected:
 https://github.com/EntilZha/nips-lda-spark/blob/45f5ad1e2646609ef9d295a0954fbefe84111d8a/src/main/scala/NipsLda.scala#L13-L19
 It would be easy to use textFile with a multifile argument, but this should 
 work correctly for s3 bucket files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org