Re: Error using collectAsMap() in scala

2016-03-21 Thread Akhil Das
What you should be doing is a join, something like this:

//Create a key, value pair, key being the column1
val rdd1 = sc.textFile(file1).map(x => (x.split(",")(0),x.split(","))

//Create a key, value pair, key being the column2
val rdd2 = sc.textFile(file2).map(x => (x.split(",")(1),x.split(","))

//Now join the dataset
val joined = rdd1.join(rdd2)

//Now do the replacement
val replaced = joined.map(...)





Thanks
Best Regards

On Mon, Mar 21, 2016 at 10:31 AM, Shishir Anshuman <
shishiranshu...@gmail.com> wrote:

> I have stored the contents of two csv files in separate RDDs.
>
> file1.csv format*: (column1,column2,column3)*
> file2.csv format*: (column1, column2)*
>
> *column1 of file1 *and* column2 of file2 *contains similar data. I want
> to compare the two columns and if match is found:
>
>- Replace the data at *column1(file1)* with the* column1(file2)*
>
>
> For this reason, I am not using normal RDD.
>
> I am still new to apache spark, so any suggestion will be greatly
> appreciated.
>
> On Mon, Mar 21, 2016 at 10:09 AM, Prem Sure  wrote:
>
>> any specific reason you would like to use collectasmap only? You probably
>> move to normal RDD instead of a Pair.
>>
>>
>> On Monday, March 21, 2016, Mark Hamstra  wrote:
>>
>>> You're not getting what Ted is telling you.  Your `dict` is an
>>> RDD[String]  -- i.e. it is a collection of a single value type, String.
>>> But `collectAsMap` is only defined for PairRDDs that have key-value pairs
>>> for their data elements.  Both a key and a value are needed to collect into
>>> a Map[K, V].
>>>
>>> On Sun, Mar 20, 2016 at 8:19 PM, Shishir Anshuman <
>>> shishiranshu...@gmail.com> wrote:
>>>
 yes I have included that class in my code.
 I guess its something to do with the RDD format. Not able to figure out
 the exact reason.

 On Fri, Mar 18, 2016 at 9:27 AM, Ted Yu  wrote:

> It is defined in:
> core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
>
> On Thu, Mar 17, 2016 at 8:55 PM, Shishir Anshuman <
> shishiranshu...@gmail.com> wrote:
>
>> I am using following code snippet in scala:
>>
>>
>> *val dict: RDD[String] = sc.textFile("path/to/csv/file")*
>> *val dict_broadcast=sc.broadcast(dict.collectAsMap())*
>>
>> On compiling It generates this error:
>>
>> *scala:42: value collectAsMap is not a member of
>> org.apache.spark.rdd.RDD[String]*
>>
>>
>> *val dict_broadcast=sc.broadcast(dict.collectAsMap())
>> ^*
>>
>
>

>>>
>


Re: Error using collectAsMap() in scala

2016-03-20 Thread Shishir Anshuman
I have stored the contents of two csv files in separate RDDs.

file1.csv format*: (column1,column2,column3)*
file2.csv format*: (column1, column2)*

*column1 of file1 *and* column2 of file2 *contains similar data. I want to
compare the two columns and if match is found:

   - Replace the data at *column1(file1)* with the* column1(file2)*


For this reason, I am not using normal RDD.

I am still new to apache spark, so any suggestion will be greatly
appreciated.

On Mon, Mar 21, 2016 at 10:09 AM, Prem Sure  wrote:

> any specific reason you would like to use collectasmap only? You probably
> move to normal RDD instead of a Pair.
>
>
> On Monday, March 21, 2016, Mark Hamstra  wrote:
>
>> You're not getting what Ted is telling you.  Your `dict` is an
>> RDD[String]  -- i.e. it is a collection of a single value type, String.
>> But `collectAsMap` is only defined for PairRDDs that have key-value pairs
>> for their data elements.  Both a key and a value are needed to collect into
>> a Map[K, V].
>>
>> On Sun, Mar 20, 2016 at 8:19 PM, Shishir Anshuman <
>> shishiranshu...@gmail.com> wrote:
>>
>>> yes I have included that class in my code.
>>> I guess its something to do with the RDD format. Not able to figure out
>>> the exact reason.
>>>
>>> On Fri, Mar 18, 2016 at 9:27 AM, Ted Yu  wrote:
>>>
 It is defined in:
 core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala

 On Thu, Mar 17, 2016 at 8:55 PM, Shishir Anshuman <
 shishiranshu...@gmail.com> wrote:

> I am using following code snippet in scala:
>
>
> *val dict: RDD[String] = sc.textFile("path/to/csv/file")*
> *val dict_broadcast=sc.broadcast(dict.collectAsMap())*
>
> On compiling It generates this error:
>
> *scala:42: value collectAsMap is not a member of
> org.apache.spark.rdd.RDD[String]*
>
>
> *val dict_broadcast=sc.broadcast(dict.collectAsMap())
> ^*
>


>>>
>>


Re: Error using collectAsMap() in scala

2016-03-20 Thread Prem Sure
any specific reason you would like to use collectasmap only? You probably
move to normal RDD instead of a Pair.


On Monday, March 21, 2016, Mark Hamstra  wrote:

> You're not getting what Ted is telling you.  Your `dict` is an RDD[String]
>  -- i.e. it is a collection of a single value type, String.  But
> `collectAsMap` is only defined for PairRDDs that have key-value pairs for
> their data elements.  Both a key and a value are needed to collect into a
> Map[K, V].
>
> On Sun, Mar 20, 2016 at 8:19 PM, Shishir Anshuman <
> shishiranshu...@gmail.com
> > wrote:
>
>> yes I have included that class in my code.
>> I guess its something to do with the RDD format. Not able to figure out
>> the exact reason.
>>
>> On Fri, Mar 18, 2016 at 9:27 AM, Ted Yu > > wrote:
>>
>>> It is defined in:
>>> core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
>>>
>>> On Thu, Mar 17, 2016 at 8:55 PM, Shishir Anshuman <
>>> shishiranshu...@gmail.com
>>> > wrote:
>>>
 I am using following code snippet in scala:


 *val dict: RDD[String] = sc.textFile("path/to/csv/file")*
 *val dict_broadcast=sc.broadcast(dict.collectAsMap())*

 On compiling It generates this error:

 *scala:42: value collectAsMap is not a member of
 org.apache.spark.rdd.RDD[String]*


 *val dict_broadcast=sc.broadcast(dict.collectAsMap())
   ^*

>>>
>>>
>>
>


Re: Error using collectAsMap() in scala

2016-03-20 Thread Mark Hamstra
You're not getting what Ted is telling you.  Your `dict` is an RDD[String]
 -- i.e. it is a collection of a single value type, String.  But
`collectAsMap` is only defined for PairRDDs that have key-value pairs for
their data elements.  Both a key and a value are needed to collect into a
Map[K, V].

On Sun, Mar 20, 2016 at 8:19 PM, Shishir Anshuman  wrote:

> yes I have included that class in my code.
> I guess its something to do with the RDD format. Not able to figure out
> the exact reason.
>
> On Fri, Mar 18, 2016 at 9:27 AM, Ted Yu  wrote:
>
>> It is defined in:
>> core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
>>
>> On Thu, Mar 17, 2016 at 8:55 PM, Shishir Anshuman <
>> shishiranshu...@gmail.com> wrote:
>>
>>> I am using following code snippet in scala:
>>>
>>>
>>> *val dict: RDD[String] = sc.textFile("path/to/csv/file")*
>>> *val dict_broadcast=sc.broadcast(dict.collectAsMap())*
>>>
>>> On compiling It generates this error:
>>>
>>> *scala:42: value collectAsMap is not a member of
>>> org.apache.spark.rdd.RDD[String]*
>>>
>>>
>>> *val dict_broadcast=sc.broadcast(dict.collectAsMap())
>>>   ^*
>>>
>>
>>
>


Re: Error using collectAsMap() in scala

2016-03-19 Thread Ted Yu
It is defined in:
core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala

On Thu, Mar 17, 2016 at 8:55 PM, Shishir Anshuman  wrote:

> I am using following code snippet in scala:
>
>
> *val dict: RDD[String] = sc.textFile("path/to/csv/file")*
> *val dict_broadcast=sc.broadcast(dict.collectAsMap())*
>
> On compiling It generates this error:
>
> *scala:42: value collectAsMap is not a member of
> org.apache.spark.rdd.RDD[String]*
>
>
> *val dict_broadcast=sc.broadcast(dict.collectAsMap())
> ^*
>


Error using collectAsMap() in scala

2016-03-19 Thread Shishir Anshuman
I am using following code snippet in scala:


*val dict: RDD[String] = sc.textFile("path/to/csv/file")*
*val dict_broadcast=sc.broadcast(dict.collectAsMap())*

On compiling It generates this error:

*scala:42: value collectAsMap is not a member of
org.apache.spark.rdd.RDD[String]*


*val dict_broadcast=sc.broadcast(dict.collectAsMap())
  ^*