Re: Large dataset on hbase

2016-04-13 Thread prabhu Mahendran
Hi,

1.Is the output of your Pig script a single file that contains all the JSON
documents corresponding to your CSV?

Yes output of my pig script having all json documents corresponding to the
CSV.

2.Also, are there any errors in logs/nifi-app.log (or on the processor in
the UI) when this happens?

Here there are no errors in both web interface(UI) and logs/nifi-app.log
file.


Thanks,

Prabhu Mahendran


On 12-Apr-2016 8:20 pm, "Bryan Bende"  wrote:

>
> Is the output of your Pig script a single file that contains all the JSON
> documents corresponding to your CSV?
> or does it create a single JSON document for each row of the CSV?
>
> Also, are there any errors in logs/nifi-app.log (or on the processor in
> the UI) when this happens?
>
> -Bryan
>
> On Tue, Apr 12, 2016 at 12:38 PM, prabhu Mahendran <
> prabhuu161...@gmail.com> wrote:
>
>> Hi,
>>
>> I just use Pig Script to convert the CSV into JSON with help of
>> ExecuteProcess.
>>
>> In my case i have use n1 from JSON document which could be stored as row
>> key in HBase Table.So n2-n22 store as columns in hbase.
>>
>> some of rows (n1's) are stored inside the table but remaining are read
>> well but not stored.
>>
>> Thanks,
>> Prabhu Mahendran
>>
>> On Tue, Apr 12, 2016 at 1:58 PM, Bryan Bende  wrote:
>>
>>> Hi Prabhu,
>>>
>>> How did you end up converting your CSV into JSON?
>>>
>>> PutHBaseJSON creates a single row from a JSON document. In your example
>>> above, using n1 as the rowId, it would create a row with columns n2 - n22.
>>> Are you seeing columns missing, or are you missing whole rows from your
>>> original CSV?
>>>
>>> Thanks,
>>>
>>> Bryan
>>>
>>>
>>>
>>> On Mon, Apr 11, 2016 at 11:43 AM, prabhu Mahendran <
>>> prabhuu161...@gmail.com> wrote:
>>>
 Hi Simon/Joe,

 Thanks for this support.
 I have successfully converted the CSV data into JSON and also insert
 those JSON data into Hbase Table using PutHBaseJSon.
 Part of JSON Sample Data like below:

 {
 "n1":"",
 "n2":"",
 "n3":"",
 "n4":"","n5":"","n6":"",
 "n7":"",
 "n8":"",
 "n9":"",

 "n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"",

 "n17":"","n18":"","n19":"","n20":"","n21":"-",
 "n22":""

 }
 PutHBaseJSON:
Table Name is 'Hike' , Column Family:'Sweet' ,Row
 Identifier Field Name:n1(Element in JSON File).

 My Record Contains 15 lacks rows but HBaseTable contains only 10 rows.
 It Can Read the 15 lacks rows but stores minimum rows.

 Anyone please help me to solve this?




 Prabhu,

 If the dataset being processed can be split up and still retain the
 necessary meaning when input to HBase I'd recommend doing that.  NiFI
 itself, as a framework, can handle very large objects because its API
 doesn't force loading of entire objects into memory.  However, various
 processors may do that and I believe ReplaceText may be one that does.
 You can use SplitText or ExecuteScript or other processors to do that
 splitting if that will help your case.

 Thanks
 Joe

 On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball 
 wrote:
 > Hi Prabhu,
 >
 > Did you try increasing the heap size in conf/bootstrap.conf? By
 default nifi
 > uses a very small RAM allocation (512MB). You can increase this by
 tweaking
 > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the
 java
 > heap, so you will need more than your data size to account for java
 object
 > overhead. The other thing to check is the buffer sizes you are using
 for
 > your replace text processors. If you’re also using Split processors,
 you can
 > sometime run up against RAM and open file limits, if this is the
 case, make
 > sure you increase the ulimit -n settings.
 >
 > Simon
 >
 > On 9 Apr 2016, at 16:51, prabhu Mahendran 
 wrote:
 >
 > Hi,
 >
 > I am new to nifi and does not know how to process large data like one
 gb csv
 > data into hbase.while try combination of getFile and putHbase shell
 leads
 > Java Out of memory error and also try combination of replace text,
 extract
 > text and puthbasejson doesn't work on large dataset but it work
 correctly in
 > smaller dataset.
 > Can anyone please help me to solve this?
 > Thanks in advance.
 >
 > Thanks & Regards,
 > Prabhu Mahendran
 >
 >

>>>
>>>
>>
>


Re: Large dataset on hbase

2016-04-12 Thread Bryan Bende
Is the output of your Pig script a single file that contains all the JSON
documents corresponding to your CSV?
or does it create a single JSON document for each row of the CSV?

Also, are there any errors in logs/nifi-app.log (or on the processor in the
UI) when this happens?

-Bryan

On Tue, Apr 12, 2016 at 12:38 PM, prabhu Mahendran 
wrote:

> Hi,
>
> I just use Pig Script to convert the CSV into JSON with help of
> ExecuteProcess.
>
> In my case i have use n1 from JSON document which could be stored as row
> key in HBase Table.So n2-n22 store as columns in hbase.
>
> some of rows (n1's) are stored inside the table but remaining are read
> well but not stored.
>
> Thanks,
> Prabhu Mahendran
>
> On Tue, Apr 12, 2016 at 1:58 PM, Bryan Bende  wrote:
>
>> Hi Prabhu,
>>
>> How did you end up converting your CSV into JSON?
>>
>> PutHBaseJSON creates a single row from a JSON document. In your example
>> above, using n1 as the rowId, it would create a row with columns n2 - n22.
>> Are you seeing columns missing, or are you missing whole rows from your
>> original CSV?
>>
>> Thanks,
>>
>> Bryan
>>
>>
>>
>> On Mon, Apr 11, 2016 at 11:43 AM, prabhu Mahendran <
>> prabhuu161...@gmail.com> wrote:
>>
>>> Hi Simon/Joe,
>>>
>>> Thanks for this support.
>>> I have successfully converted the CSV data into JSON and also insert
>>> those JSON data into Hbase Table using PutHBaseJSon.
>>> Part of JSON Sample Data like below:
>>>
>>> {
>>> "n1":"",
>>> "n2":"",
>>> "n3":"",
>>> "n4":"","n5":"","n6":"",
>>> "n7":"",
>>> "n8":"",
>>> "n9":"",
>>>
>>> "n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"",
>>>
>>> "n17":"","n18":"","n19":"","n20":"","n21":"-",
>>> "n22":""
>>>
>>> }
>>> PutHBaseJSON:
>>>Table Name is 'Hike' , Column Family:'Sweet' ,Row
>>> Identifier Field Name:n1(Element in JSON File).
>>>
>>> My Record Contains 15 lacks rows but HBaseTable contains only 10 rows.
>>> It Can Read the 15 lacks rows but stores minimum rows.
>>>
>>> Anyone please help me to solve this?
>>>
>>>
>>>
>>>
>>> Prabhu,
>>>
>>> If the dataset being processed can be split up and still retain the
>>> necessary meaning when input to HBase I'd recommend doing that.  NiFI
>>> itself, as a framework, can handle very large objects because its API
>>> doesn't force loading of entire objects into memory.  However, various
>>> processors may do that and I believe ReplaceText may be one that does.
>>> You can use SplitText or ExecuteScript or other processors to do that
>>> splitting if that will help your case.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball 
>>> wrote:
>>> > Hi Prabhu,
>>> >
>>> > Did you try increasing the heap size in conf/bootstrap.conf? By
>>> default nifi
>>> > uses a very small RAM allocation (512MB). You can increase this by
>>> tweaking
>>> > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the
>>> java
>>> > heap, so you will need more than your data size to account for java
>>> object
>>> > overhead. The other thing to check is the buffer sizes you are using
>>> for
>>> > your replace text processors. If you’re also using Split processors,
>>> you can
>>> > sometime run up against RAM and open file limits, if this is the case,
>>> make
>>> > sure you increase the ulimit -n settings.
>>> >
>>> > Simon
>>> >
>>> > On 9 Apr 2016, at 16:51, prabhu Mahendran 
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I am new to nifi and does not know how to process large data like one
>>> gb csv
>>> > data into hbase.while try combination of getFile and putHbase shell
>>> leads
>>> > Java Out of memory error and also try combination of replace text,
>>> extract
>>> > text and puthbasejson doesn't work on large dataset but it work
>>> correctly in
>>> > smaller dataset.
>>> > Can anyone please help me to solve this?
>>> > Thanks in advance.
>>> >
>>> > Thanks & Regards,
>>> > Prabhu Mahendran
>>> >
>>> >
>>>
>>
>>
>


Re: Large dataset on hbase

2016-04-12 Thread prabhu Mahendran
Hi,

I just use Pig Script to convert the CSV into JSON with help of
ExecuteProcess.

In my case i have use n1 from JSON document which could be stored as row
key in HBase Table.So n2-n22 store as columns in hbase.

some of rows (n1's) are stored inside the table but remaining are read well
but not stored.

Thanks,
Prabhu Mahendran

On Tue, Apr 12, 2016 at 1:58 PM, Bryan Bende  wrote:

> Hi Prabhu,
>
> How did you end up converting your CSV into JSON?
>
> PutHBaseJSON creates a single row from a JSON document. In your example
> above, using n1 as the rowId, it would create a row with columns n2 - n22.
> Are you seeing columns missing, or are you missing whole rows from your
> original CSV?
>
> Thanks,
>
> Bryan
>
>
>
> On Mon, Apr 11, 2016 at 11:43 AM, prabhu Mahendran <
> prabhuu161...@gmail.com> wrote:
>
>> Hi Simon/Joe,
>>
>> Thanks for this support.
>> I have successfully converted the CSV data into JSON and also insert
>> those JSON data into Hbase Table using PutHBaseJSon.
>> Part of JSON Sample Data like below:
>>
>> {
>> "n1":"",
>> "n2":"",
>> "n3":"",
>> "n4":"","n5":"","n6":"",
>> "n7":"",
>> "n8":"",
>> "n9":"",
>>
>> "n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"",
>>
>> "n17":"","n18":"","n19":"","n20":"","n21":"-",
>> "n22":""
>>
>> }
>> PutHBaseJSON:
>>Table Name is 'Hike' , Column Family:'Sweet' ,Row
>> Identifier Field Name:n1(Element in JSON File).
>>
>> My Record Contains 15 lacks rows but HBaseTable contains only 10 rows.
>> It Can Read the 15 lacks rows but stores minimum rows.
>>
>> Anyone please help me to solve this?
>>
>>
>>
>>
>> Prabhu,
>>
>> If the dataset being processed can be split up and still retain the
>> necessary meaning when input to HBase I'd recommend doing that.  NiFI
>> itself, as a framework, can handle very large objects because its API
>> doesn't force loading of entire objects into memory.  However, various
>> processors may do that and I believe ReplaceText may be one that does.
>> You can use SplitText or ExecuteScript or other processors to do that
>> splitting if that will help your case.
>>
>> Thanks
>> Joe
>>
>> On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball  wrote:
>> > Hi Prabhu,
>> >
>> > Did you try increasing the heap size in conf/bootstrap.conf? By default
>> nifi
>> > uses a very small RAM allocation (512MB). You can increase this by
>> tweaking
>> > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java
>> > heap, so you will need more than your data size to account for java
>> object
>> > overhead. The other thing to check is the buffer sizes you are using for
>> > your replace text processors. If you’re also using Split processors,
>> you can
>> > sometime run up against RAM and open file limits, if this is the case,
>> make
>> > sure you increase the ulimit -n settings.
>> >
>> > Simon
>> >
>> > On 9 Apr 2016, at 16:51, prabhu Mahendran 
>> wrote:
>> >
>> > Hi,
>> >
>> > I am new to nifi and does not know how to process large data like one
>> gb csv
>> > data into hbase.while try combination of getFile and putHbase shell
>> leads
>> > Java Out of memory error and also try combination of replace text,
>> extract
>> > text and puthbasejson doesn't work on large dataset but it work
>> correctly in
>> > smaller dataset.
>> > Can anyone please help me to solve this?
>> > Thanks in advance.
>> >
>> > Thanks & Regards,
>> > Prabhu Mahendran
>> >
>> >
>>
>
>


Re: Large dataset on hbase

2016-04-12 Thread Bryan Bende
Hi Prabhu,

How did you end up converting your CSV into JSON?

PutHBaseJSON creates a single row from a JSON document. In your example
above, using n1 as the rowId, it would create a row with columns n2 - n22.
Are you seeing columns missing, or are you missing whole rows from your
original CSV?

Thanks,

Bryan



On Mon, Apr 11, 2016 at 11:43 AM, prabhu Mahendran 
wrote:

> Hi Simon/Joe,
>
> Thanks for this support.
> I have successfully converted the CSV data into JSON and also insert those
> JSON data into Hbase Table using PutHBaseJSon.
> Part of JSON Sample Data like below:
>
> {
> "n1":"",
> "n2":"",
> "n3":"",
> "n4":"","n5":"","n6":"",
> "n7":"",
> "n8":"",
> "n9":"",
>
> "n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"",
>
> "n17":"","n18":"","n19":"","n20":"","n21":"-",
> "n22":""
>
> }
> PutHBaseJSON:
>Table Name is 'Hike' , Column Family:'Sweet' ,Row
> Identifier Field Name:n1(Element in JSON File).
>
> My Record Contains 15 lacks rows but HBaseTable contains only 10 rows.
> It Can Read the 15 lacks rows but stores minimum rows.
>
> Anyone please help me to solve this?
>
>
>
>
> Prabhu,
>
> If the dataset being processed can be split up and still retain the
> necessary meaning when input to HBase I'd recommend doing that.  NiFI
> itself, as a framework, can handle very large objects because its API
> doesn't force loading of entire objects into memory.  However, various
> processors may do that and I believe ReplaceText may be one that does.
> You can use SplitText or ExecuteScript or other processors to do that
> splitting if that will help your case.
>
> Thanks
> Joe
>
> On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball  wrote:
> > Hi Prabhu,
> >
> > Did you try increasing the heap size in conf/bootstrap.conf? By default
> nifi
> > uses a very small RAM allocation (512MB). You can increase this by
> tweaking
> > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java
> > heap, so you will need more than your data size to account for java
> object
> > overhead. The other thing to check is the buffer sizes you are using for
> > your replace text processors. If you’re also using Split processors, you
> can
> > sometime run up against RAM and open file limits, if this is the case,
> make
> > sure you increase the ulimit -n settings.
> >
> > Simon
> >
> > On 9 Apr 2016, at 16:51, prabhu Mahendran 
> wrote:
> >
> > Hi,
> >
> > I am new to nifi and does not know how to process large data like one gb
> csv
> > data into hbase.while try combination of getFile and putHbase shell leads
> > Java Out of memory error and also try combination of replace text,
> extract
> > text and puthbasejson doesn't work on large dataset but it work
> correctly in
> > smaller dataset.
> > Can anyone please help me to solve this?
> > Thanks in advance.
> >
> > Thanks & Regards,
> > Prabhu Mahendran
> >
> >
>


Re: Large dataset on hbase

2016-04-11 Thread prabhu Mahendran
Hi Simon/Joe,

Thanks for this support.
I have successfully converted the CSV data into JSON and also insert those
JSON data into Hbase Table using PutHBaseJSon.
Part of JSON Sample Data like below:

{
"n1":"",
"n2":"",
"n3":"",
"n4":"","n5":"","n6":"",
"n7":"",
"n8":"",
"n9":"",
"n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"",
"n17":"","n18":"","n19":"","n20":"","n21":"-",
"n22":""

}
PutHBaseJSON:
   Table Name is 'Hike' , Column Family:'Sweet' ,Row
Identifier Field Name:n1(Element in JSON File).

My Record Contains 15 lacks rows but HBaseTable contains only 10 rows.
It Can Read the 15 lacks rows but stores minimum rows.

Anyone please help me to solve this?




Prabhu,

If the dataset being processed can be split up and still retain the
necessary meaning when input to HBase I'd recommend doing that.  NiFI
itself, as a framework, can handle very large objects because its API
doesn't force loading of entire objects into memory.  However, various
processors may do that and I believe ReplaceText may be one that does.
You can use SplitText or ExecuteScript or other processors to do that
splitting if that will help your case.

Thanks
Joe

On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball  wrote:
> Hi Prabhu,
>
> Did you try increasing the heap size in conf/bootstrap.conf? By default
nifi
> uses a very small RAM allocation (512MB). You can increase this by
tweaking
> java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java
> heap, so you will need more than your data size to account for java object
> overhead. The other thing to check is the buffer sizes you are using for
> your replace text processors. If you’re also using Split processors, you
can
> sometime run up against RAM and open file limits, if this is the case,
make
> sure you increase the ulimit -n settings.
>
> Simon
>
> On 9 Apr 2016, at 16:51, prabhu Mahendran  wrote:
>
> Hi,
>
> I am new to nifi and does not know how to process large data like one gb
csv
> data into hbase.while try combination of getFile and putHbase shell leads
> Java Out of memory error and also try combination of replace text, extract
> text and puthbasejson doesn't work on large dataset but it work correctly
in
> smaller dataset.
> Can anyone please help me to solve this?
> Thanks in advance.
>
> Thanks & Regards,
> Prabhu Mahendran
>
>


Re: Large dataset on hbase

2016-04-09 Thread Joe Witt
Prabhu,

If the dataset being processed can be split up and still retain the
necessary meaning when input to HBase I'd recommend doing that.  NiFI
itself, as a framework, can handle very large objects because its API
doesn't force loading of entire objects into memory.  However, various
processors may do that and I believe ReplaceText may be one that does.
You can use SplitText or ExecuteScript or other processors to do that
splitting if that will help your case.

Thanks
Joe

On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball  wrote:
> Hi Prabhu,
>
> Did you try increasing the heap size in conf/bootstrap.conf? By default nifi
> uses a very small RAM allocation (512MB). You can increase this by tweaking
> java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java
> heap, so you will need more than your data size to account for java object
> overhead. The other thing to check is the buffer sizes you are using for
> your replace text processors. If you’re also using Split processors, you can
> sometime run up against RAM and open file limits, if this is the case, make
> sure you increase the ulimit -n settings.
>
> Simon
>
> On 9 Apr 2016, at 16:51, prabhu Mahendran  wrote:
>
> Hi,
>
> I am new to nifi and does not know how to process large data like one gb csv
> data into hbase.while try combination of getFile and putHbase shell leads
> Java Out of memory error and also try combination of replace text, extract
> text and puthbasejson doesn't work on large dataset but it work correctly in
> smaller dataset.
> Can anyone please help me to solve this?
> Thanks in advance.
>
> Thanks & Regards,
> Prabhu Mahendran
>
>


Re: Large dataset on hbase

2016-04-09 Thread Simon Ball
Hi Prabhu,

Did you try increasing the heap size in conf/bootstrap.conf? By default nifi 
uses a very small RAM allocation (512MB). You can increase this by tweaking 
java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java heap, 
so you will need more than your data size to account for java object overhead. 
The other thing to check is the buffer sizes you are using for your replace 
text processors. If you’re also using Split processors, you can sometime run up 
against RAM and open file limits, if this is the case, make sure you increase 
the ulimit -n settings.

Simon

On 9 Apr 2016, at 16:51, prabhu Mahendran 
mailto:prabhuu161...@gmail.com>> wrote:


Hi,

I am new to nifi and does not know how to process large data like one gb csv 
data into hbase.while try combination of getFile and putHbase shell leads Java 
Out of memory error and also try combination of replace text, extract text and 
puthbasejson doesn't work on large dataset but it work correctly in smaller 
dataset.
Can anyone please help me to solve this?
Thanks in advance.

Thanks & Regards,
Prabhu Mahendran



Large dataset on hbase

2016-04-09 Thread prabhu Mahendran
Hi,

I am new to nifi and does not know how to process large data like one gb
csv data into hbase.while try combination of getFile and putHbase shell
leads Java Out of memory error and also try combination of replace text,
extract text and puthbasejson doesn't work on large dataset but it work
correctly in smaller dataset.
Can anyone please help me to solve this?
Thanks in advance.

Thanks & Regards,
Prabhu Mahendran