Re: merging small files in HDFS

2016-11-03 Thread dileep kumar
Hi ,

You need to write a map method to just parse input file and pass it to
reducer.. use only reducer..so that all maps output will go to one reducer
and one file gets created,which is merge of input files..

On 03-Nov-2016 8:54 pm, "Piyush Mukati"  wrote:

> Hi,
> I want to merge multiple files in one HDFS dir to one file. I am planning
> to write a map only job using input format which will create only one
> inputSplit per dir.
> this way my job don't need to do any shuffle/sort.(only read and write
> back to disk)
> Is there any such file format already implemented ?
> Or any there better solution for the problem.
>
> thanks.
>
>


Re: Steps to Run Spark Scala job from Oozie on EC2 Hadoop clsuter

2016-03-07 Thread dileep kumar
Hi Divya,

Please find below code to invoke spark from oozie.

Oozie file:
+





maprfs:///
maprfs:///


mapred.job.queue.name

dileep


sparkshell.sh

/ggg/gms/gmsrffr/dev/dileep/sparkshell.sh
  /axp/gms/gmsrffr/dev/dileep/sparkshell.scala





 Java failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
  




#
SparkShell.sh
##

/opt/mapr/spark/spark-1.2.1/bin/spark-shell --name perf108pret1
--num-executors 1 --executor-cores 1 --executor-memory 1G --driver-memory
2G -i sparkshell.scala
exit 0

On Mon, Mar 7, 2016 at 9:32 PM, Chandeep Singh  wrote:

> As a work around you could put your spark-submit statement in a shell
> script and then use Oozie’s SSH action to execute that script.
>
> On Mar 7, 2016, at 3:58 PM, Neelesh Salian  wrote:
>
> Hi Divya,
>
> This link should have the details that you need to begin using the Spark
> Action on Oozie:
> https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html
>
> Thanks.
>
> On Mon, Mar 7, 2016 at 7:52 AM, Benjamin Kim  wrote:
>
>> To comment…
>>
>> At my company, we have not gotten it to work in any other mode than
>> local. If we try any of the yarn modes, it fails with a “file does not
>> exist” error when trying to locate the executable jar. I mentioned this to
>> the Hue users group, which we used for this, and they replied that the
>> Spark Action is very basic implementation and that they will be writing
>> their own for production use.
>>
>> That’s all I know...
>>
>> On Mar 7, 2016, at 1:18 AM, Deepak Sharma  wrote:
>>
>> There is Spark action defined for oozie workflows.
>> Though I am not sure if it supports only Java SPARK jobs or Scala jobs as
>> well.
>> https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html
>> Thanks
>> Deepak
>>
>> On Mon, Mar 7, 2016 at 2:44 PM, Divya Gehlot 
>> wrote:
>>
>>> Hi,
>>>
>>> Could somebody help me by providing the steps /redirect me  to
>>> blog/documentation on how to run Spark job written in scala through Oozie.
>>>
>>> Would really appreciate the help.
>>>
>>>
>>>
>>> Thanks,
>>> Divya
>>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>>
>>
>
>
> --
> Neelesh Srinivas Salian
> Customer Operations Engineer
>
>
>
>
>


-- 
Regards
Dileep Kumar
+91 9742443302


Re: Delete a folder name containing *

2014-08-20 Thread dileep kumar
Try
hadoop fs -mv /data/folder*/*  
Now you have only /data/folder*. and all data under /data/folder* will be
moved to new folder ,then delete /data/folder*.Not sure if it works. just
make a try.


On Wed, Aug 20, 2014 at 8:26 AM, Ritesh Kumar Singh <
riteshoneinamill...@gmail.com> wrote:

> try putting the name in quotes
>
>
> On Wed, Aug 20, 2014 at 4:35 PM, praveenesh kumar 
> wrote:
>
>> With renaming - you would use the mv command "hadoop fs -mv /data/folder*
>> /data/new_folder". Won't it move all the sub_dirs along with that ?
>>
>>
>> On Wed, Aug 20, 2014 at 12:00 PM, dileep kumar 
>> wrote:
>>
>>> Just Rename the folder.
>>>
>>>
>>> On Wed, Aug 20, 2014 at 6:53 AM, praveenesh kumar 
>>> wrote:
>>>
>>>> Hi team
>>>>
>>>> I am in weird situation where I have  following HDFS sample folders
>>>>
>>>> /data/folder/
>>>> /data/folder*
>>>> /data/folder_day
>>>> /data/folder_day/monday
>>>> /data/folder/1
>>>> /data/folder/2
>>>>
>>>> I want to delete /data/folder* without deleting its sub_folders. If I
>>>> do hadoop fs -rmr /data/folder* it will delete everything which I want to
>>>> avoid. I tried with escape character \ but HDFS FS shell is not taking it.
>>>> Any hints/tricks ?
>>>>
>>>>
>>>> Regards
>>>> Praveenesh
>>>>
>>>
>>>
>>
>


Re: Delete a folder name containing *

2014-08-20 Thread dileep kumar
Just Rename the folder.


On Wed, Aug 20, 2014 at 6:53 AM, praveenesh kumar 
wrote:

> Hi team
>
> I am in weird situation where I have  following HDFS sample folders
>
> /data/folder/
> /data/folder*
> /data/folder_day
> /data/folder_day/monday
> /data/folder/1
> /data/folder/2
>
> I want to delete /data/folder* without deleting its sub_folders. If I do
> hadoop fs -rmr /data/folder* it will delete everything which I want to
> avoid. I tried with escape character \ but HDFS FS shell is not taking it.
> Any hints/tricks ?
>
>
> Regards
> Praveenesh
>