Re: sparkR could not find function "textFile"

Shivaram Venkataraman Thu, 25 Jun 2015 14:00:39 -0700

The `head` function is not supported for the RRDD that is returned by
`textFile`. You can run `take(lines, 5L)`. I should add a warning here that
the RDD API in SparkR is private because we might not support it in the
upcoming releases. So if you can use the DataFrame API for your application
you should try that out.


Thanks
Shivaram

On Thu, Jun 25, 2015 at 1:49 PM, Wei Zhou <zhweisop...@gmail.com> wrote:

> Hi Alek,
>
> Just a follow up question. This is what I did in sparkR shell:
>
> lines <- SparkR:::textFile(sc, "./README.md")
> head(lines)
>
> And I am getting error:
>
> "Error in x[seq_len(n)] : object of type 'S4' is not subsettable"
>
> I'm wondering what did I do wrong. Thanks in advance.
>
> Wei
>
> 2015-06-25 13:44 GMT-07:00 Wei Zhou <zhweisop...@gmail.com>:
>
>> Hi Alek,
>>
>> Thanks for the explanation, it is very helpful.
>>
>> Cheers,
>> Wei
>>
>> 2015-06-25 13:40 GMT-07:00 Eskilson,Aleksander <alek.eskil...@cerner.com>
>> :
>>
>>>  Hi there,
>>>
>>>  The tutorial you’re reading there was written before the merge of
>>> SparkR for Spark 1.4.0
>>> For the merge, the RDD API (which includes the textFile() function) was
>>> made private, as the devs felt many of its functions were too low level.
>>> They focused instead on finishing the DataFrame API which supports local,
>>> HDFS, and Hive/HBase file reads. In the meantime, the devs are trying to
>>> determine which functions of the RDD API, if any, should be made public
>>> again. You can see the rationale behind this decision on the issue’s JIRA
>>> [1].
>>>
>>>  You can still make use of those now private RDD functions by
>>> prepending the function call with the SparkR private namespace, for
>>> example, you’d use
>>> SparkR:::textFile(…).
>>>
>>>  Hope that helps,
>>> Alek
>>>
>>>  [1] -- https://issues.apache.org/jira/browse/SPARK-7230
>>>
>>>   From: Wei Zhou <zhweisop...@gmail.com>
>>> Date: Thursday, June 25, 2015 at 3:33 PM
>>> To: "user@spark.apache.org" <user@spark.apache.org>
>>> Subject: sparkR could not find function "textFile"
>>>
>>>   Hi all,
>>>
>>>  I am exploring sparkR by activating the shell and following the
>>> tutorial here https://amplab-extras.github.io/SparkR-pkg/
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__amplab-2Dextras.github.io_SparkR-2Dpkg_&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=aL4A2Pv9tHbhgJUX-EnuYx2HntTnrqVpegm6Ag-FwnQ&s=qfOET1UvP0ECAKgnTJw8G13sFTi_PhiJ8Q89fMSgH_Q&e=>
>>>
>>>  And when I tried to read in a local file with textFile(sc,
>>> "file_location"), it gives an error could not find function "textFile".
>>>
>>>  By reading through sparkR doc for 1.4, it seems that we need
>>> sqlContext to import data, for example.
>>>
>>> people <- read.df(sqlContext, "./examples/src/main/resources/people.json", 
>>> "json"
>>>
>>> )
>>> And we need to specify the file type.
>>>
>>>  My question is does sparkR stop supporting general type file
>>> importing? If not, would appreciate any help on how to do this.
>>>
>>>  PS, I am trying to recreate the word count example in sparkR, and want
>>> to import README.md file, or just any file into sparkR.
>>>
>>>  Thanks in advance.
>>>
>>>  Best,
>>> Wei
>>>
>>>    CONFIDENTIALITY NOTICE This message and any included attachments are
>>> from Cerner Corporation and are intended only for the addressee. The
>>> information contained in this message is confidential and may constitute
>>> inside or non-public information under international, federal, or state
>>> securities laws. Unauthorized forwarding, printing, copying, distribution,
>>> or use of such information is strictly prohibited and may be unlawful. If
>>> you are not the addressee, please promptly delete this message and notify
>>> the sender of the delivery error by e-mail or you may call Cerner's
>>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>>>
>>
>>
>

Re: sparkR could not find function "textFile"

Reply via email to