Thanks Cheng !!

On Thu, Apr 24, 2014 at 5:43 PM, Cheng Lian <lian.cs....@gmail.com> wrote:

> You may try this:
>
> val lastOption = sc.textFile("input").mapPartitions { iterator =>
>   if (iterator.isEmpty) {
>     iterator
>   } else {
>     Iterator
>       .continually((iterator.next(), iterator.hasNext()))
>       .collect { case (value, false) => value }
>       .take(1)
>   }
> }.collect().lastOption
>
> Iterator based data access ensures O(1) space complexity and it runs
> faster because different partitions are processed in parallel. lastOptionis 
> used instead of
> last to deal with empty file.
>
>
> On Thu, Apr 24, 2014 at 7:38 PM, Sai Prasanna <ansaiprasa...@gmail.com>wrote:
>
>> Hi All, Finally i wrote the following code, which is felt does optimally
>> if not the most optimum one.
>> Using file pointers, seeking the byte after the last \n but backwards !!
>> This is memory efficient and i hope even unix tail implementation should
>> be something similar !!
>>
>> import java.io.RandomAccessFile
>> import java.io.IOException
>> var FILEPATH="/home/sparkcluster/hadoop-2.3.0/temp";
>>         var fileHandler = new RandomAccessFile( FILEPATH, "r" );
>>         var fileLength = fileHandler.length() - 1;
>>         var cond = 1;
>>         var filePointer = fileLength-1;
>>         var toRead= -1;
>>         while(filePointer != -1 && cond!=0){
>>                  fileHandler.seek( filePointer );
>>                  var readByte = fileHandler.readByte();
>>                  if( readByte == 0xA && filePointer != fileLength )
>> cond=0;
>>                   else if( readByte == 0xD && filePointer != fileLength
>> - 1 ) cond=0;
>>
>>                  filePointer=filePointer-1; toRead=toRead+1;
>>         }
>>         filePointer=filePointer+2;
>>         var bytes : Array[Byte] = new Array[Byte](toRead);
>>         fileHandler.seek(filePointer);
>>         fileHandler.read(bytes);
>>         var bdd=new String(bytes);  /*bdd contains the last line*/
>>
>>
>>
>>
>> On Thu, Apr 24, 2014 at 11:42 AM, Sai Prasanna 
>> <ansaiprasa...@gmail.com>wrote:
>>
>>> Thanks Guys !
>>>
>>>
>>> On Thu, Apr 24, 2014 at 11:29 AM, Sourav Chandra <
>>> sourav.chan...@livestream.com> wrote:
>>>
>>>> Also same thing can be done using rdd.top(1)(reverseOrdering)
>>>>
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 11:28 AM, Sourav Chandra <
>>>> sourav.chan...@livestream.com> wrote:
>>>>
>>>>> You can use rdd.takeOrdered(1)(reverseOrdrering)
>>>>>
>>>>> reverseOrdering is you Ordering[T] instance where you define the
>>>>> ordering logic. This you have to pass in the method
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 24, 2014 at 11:21 AM, Frank Austin Nothaft <
>>>>> fnoth...@berkeley.edu> wrote:
>>>>>
>>>>>> If you do this, you could simplify to:
>>>>>>
>>>>>> RDD.collect().last
>>>>>>
>>>>>> However, this has the problem of collecting all data to the driver.
>>>>>>
>>>>>> Is your data sorted? If so, you could reverse the sort and take the
>>>>>> first. Alternatively, a hackey implementation might involve a
>>>>>> mapPartitionsWithIndex that returns an empty iterator for all partitions
>>>>>> except for the last. For the last partition, you would filter all 
>>>>>> elements
>>>>>> except for the last element in your iterator. This should leave one
>>>>>> element, which is your last element.
>>>>>>
>>>>>> Frank Austin Nothaft
>>>>>> fnoth...@berkeley.edu
>>>>>> fnoth...@eecs.berkeley.edu
>>>>>> 202-340-0466
>>>>>>
>>>>>> On Apr 23, 2014, at 10:44 PM, Adnan Yaqoob <nsyaq...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> This function will return scala List, you can use List's last
>>>>>> function to get the last element.
>>>>>>
>>>>>> For example:
>>>>>>
>>>>>> RDD.take(RDD.count()).last
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna <
>>>>>> ansaiprasa...@gmail.com> wrote:
>>>>>>
>>>>>>> Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
>>>>>>>
>>>>>>> I want only to access the last element.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna <
>>>>>>> ansaiprasa...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Oh ya, Thanks Adnan.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob 
>>>>>>>> <nsyaq...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> You can use following code:
>>>>>>>>>
>>>>>>>>> RDD.take(RDD.count())
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <
>>>>>>>>> ansaiprasa...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi All, Some help !
>>>>>>>>>> RDD.first or RDD.take(1) gives the first item, is there a
>>>>>>>>>> straight forward way to access the last element in a similar way ?
>>>>>>>>>>
>>>>>>>>>> I coudnt fine a tail/last method for RDD. !!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Sourav Chandra
>>>>>
>>>>> Senior Software Engineer
>>>>>
>>>>> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>>>>>
>>>>> sourav.chan...@livestream.com
>>>>>
>>>>> o: +91 80 4121 8723
>>>>>
>>>>> m: +91 988 699 3746
>>>>>
>>>>> skype: sourav.chandra
>>>>>
>>>>> Livestream
>>>>>
>>>>> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main,
>>>>> 3rd Block, Koramangala Industrial Area,
>>>>>
>>>>> Bangalore 560034
>>>>>
>>>>> www.livestream.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Sourav Chandra
>>>>
>>>> Senior Software Engineer
>>>>
>>>> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>>>>
>>>> sourav.chan...@livestream.com
>>>>
>>>> o: +91 80 4121 8723
>>>>
>>>> m: +91 988 699 3746
>>>>
>>>> skype: sourav.chandra
>>>>
>>>> Livestream
>>>>
>>>> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
>>>> Block, Koramangala Industrial Area,
>>>>
>>>> Bangalore 560034
>>>>
>>>> www.livestream.com
>>>>
>>>
>>>
>>
>

Reply via email to