[jira] [Commented] (SPARK-6626) TwitterUtils.createStream documentation error
[ https://issues.apache.org/jira/browse/SPARK-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388389#comment-14388389 ] Jayson Sunshine commented on SPARK-6626: Okay, that sounds good. I will try to do this before end of day 31 Mar 2015 :) TwitterUtils.createStream documentation error - Key: SPARK-6626 URL: https://issues.apache.org/jira/browse/SPARK-6626 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.3.0 Reporter: Jayson Sunshine Priority: Minor Labels: documentation, easyfix Original Estimate: 5m Remaining Estimate: 5m At http://spark.apache.org/docs/1.3.0/streaming-programming-guide.html#input-dstreams-and-receivers, under 'Advanced Sources', the documentation provides the following call for Scala: TwitterUtils.createStream(ssc) However, with only one parameter to this method it appears a jssc object is required, not a ssc object: http://spark.apache.org/docs/1.3.0/api/java/index.html?org/apache/spark/streaming/twitter/TwitterUtils.html To make the above call work one must instead provide an option argument, for example: TwitterUtils.createStream(ssc, None) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6626) TwitterUtils.createStream documentation error
[ https://issues.apache.org/jira/browse/SPARK-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388401#comment-14388401 ] Jayson Sunshine commented on SPARK-6626: I submitted a pull request on GitHub for Apache/Spark. Is that the preferred way to do it? TwitterUtils.createStream documentation error - Key: SPARK-6626 URL: https://issues.apache.org/jira/browse/SPARK-6626 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.3.0 Reporter: Jayson Sunshine Priority: Minor Labels: documentation, easyfix Original Estimate: 5m Remaining Estimate: 5m At http://spark.apache.org/docs/1.3.0/streaming-programming-guide.html#input-dstreams-and-receivers, under 'Advanced Sources', the documentation provides the following call for Scala: TwitterUtils.createStream(ssc) However, with only one parameter to this method it appears a jssc object is required, not a ssc object: http://spark.apache.org/docs/1.3.0/api/java/index.html?org/apache/spark/streaming/twitter/TwitterUtils.html To make the above call work one must instead provide an option argument, for example: TwitterUtils.createStream(ssc, None) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6626) TwitterUtils.createStream documentation error
Jayson Sunshine created SPARK-6626: -- Summary: TwitterUtils.createStream documentation error Key: SPARK-6626 URL: https://issues.apache.org/jira/browse/SPARK-6626 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.3.0 Reporter: Jayson Sunshine Priority: Minor At http://spark.apache.org/docs/1.3.0/streaming-programming-guide.html#input-dstreams-and-receivers, under 'Advanced Sources', the documentation provides the following call for Scala: TwitterUtils.createStream(ssc) However, with only one parameter to this method it appears a jssc object is required, not a ssc object: http://spark.apache.org/docs/1.3.0/api/java/index.html?org/apache/spark/streaming/twitter/TwitterUtils.html To make the above call work one must instead provide an option argument, for example: TwitterUtils.createStream(ssc, None) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4414) SparkContext.wholeTextFiles Doesn't work with S3 Buckets
[ https://issues.apache.org/jira/browse/SPARK-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381315#comment-14381315 ] Jayson Sunshine commented on SPARK-4414: Is this issue related to the input source being compressed? Can wholeTextFiles handle compressed files similarly to textFile? Pedro, I infer from your file name of 'myfile.txt' that it was not compressed. Is this true? Phatak, in your gist you are grabbing a part file from s3 whereas Pedro was trying to read off a 'whole' file name. Do you guys think this matters? I, too, cannot read with wholeTextFiles files on s3 that I can read with textFile. SparkContext.wholeTextFiles Doesn't work with S3 Buckets Key: SPARK-4414 URL: https://issues.apache.org/jira/browse/SPARK-4414 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0, 1.2.0 Reporter: Pedro Rodriguez Priority: Critical SparkContext.wholeTextFiles does not read files which SparkContext.textFile can read. Below are general steps to reproduce, my specific case is following that on a git repo. Steps to reproduce. 1. Create Amazon S3 bucket, make public with multiple files 2. Attempt to read bucket with sc.wholeTextFiles(s3n://mybucket/myfile.txt) 3. Spark returns the following error, even if the file exists. Exception in thread main java.io.FileNotFoundException: File does not exist: /myfile.txt at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:517) at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.init(CombineFileInputFormat.java:489) 4. Change the call to sc.textFile(s3n://mybucket/myfile.txt) and there is no error message, the application should run fine. There is a question on StackOverflow as well on this: http://stackoverflow.com/questions/26258458/sparkcontext-wholetextfiles-java-io-filenotfoundexception-file-does-not-exist This is link to repo/lines of code. The uncommented call doesn't work, the commented call works as expected: https://github.com/EntilZha/nips-lda-spark/blob/45f5ad1e2646609ef9d295a0954fbefe84111d8a/src/main/scala/NipsLda.scala#L13-L19 It would be easy to use textFile with a multifile argument, but this should work correctly for s3 bucket files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org