Re: Large dataset on hbase
Hi, 1.Is the output of your Pig script a single file that contains all the JSON documents corresponding to your CSV? Yes output of my pig script having all json documents corresponding to the CSV. 2.Also, are there any errors in logs/nifi-app.log (or on the processor in the UI) when this happens? Here there are no errors in both web interface(UI) and logs/nifi-app.log file. Thanks, Prabhu Mahendran On 12-Apr-2016 8:20 pm, "Bryan Bende" wrote: > > Is the output of your Pig script a single file that contains all the JSON > documents corresponding to your CSV? > or does it create a single JSON document for each row of the CSV? > > Also, are there any errors in logs/nifi-app.log (or on the processor in > the UI) when this happens? > > -Bryan > > On Tue, Apr 12, 2016 at 12:38 PM, prabhu Mahendran < > prabhuu161...@gmail.com> wrote: > >> Hi, >> >> I just use Pig Script to convert the CSV into JSON with help of >> ExecuteProcess. >> >> In my case i have use n1 from JSON document which could be stored as row >> key in HBase Table.So n2-n22 store as columns in hbase. >> >> some of rows (n1's) are stored inside the table but remaining are read >> well but not stored. >> >> Thanks, >> Prabhu Mahendran >> >> On Tue, Apr 12, 2016 at 1:58 PM, Bryan Bende wrote: >> >>> Hi Prabhu, >>> >>> How did you end up converting your CSV into JSON? >>> >>> PutHBaseJSON creates a single row from a JSON document. In your example >>> above, using n1 as the rowId, it would create a row with columns n2 - n22. >>> Are you seeing columns missing, or are you missing whole rows from your >>> original CSV? >>> >>> Thanks, >>> >>> Bryan >>> >>> >>> >>> On Mon, Apr 11, 2016 at 11:43 AM, prabhu Mahendran < >>> prabhuu161...@gmail.com> wrote: >>> Hi Simon/Joe, Thanks for this support. I have successfully converted the CSV data into JSON and also insert those JSON data into Hbase Table using PutHBaseJSon. Part of JSON Sample Data like below: { "n1":"", "n2":"", "n3":"", "n4":"","n5":"","n6":"", "n7":"", "n8":"", "n9":"", "n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"", "n17":"","n18":"","n19":"","n20":"","n21":"-", "n22":"" } PutHBaseJSON: Table Name is 'Hike' , Column Family:'Sweet' ,Row Identifier Field Name:n1(Element in JSON File). My Record Contains 15 lacks rows but HBaseTable contains only 10 rows. It Can Read the 15 lacks rows but stores minimum rows. Anyone please help me to solve this? Prabhu, If the dataset being processed can be split up and still retain the necessary meaning when input to HBase I'd recommend doing that. NiFI itself, as a framework, can handle very large objects because its API doesn't force loading of entire objects into memory. However, various processors may do that and I believe ReplaceText may be one that does. You can use SplitText or ExecuteScript or other processors to do that splitting if that will help your case. Thanks Joe On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball wrote: > Hi Prabhu, > > Did you try increasing the heap size in conf/bootstrap.conf? By default nifi > uses a very small RAM allocation (512MB). You can increase this by tweaking > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java > heap, so you will need more than your data size to account for java object > overhead. The other thing to check is the buffer sizes you are using for > your replace text processors. If you’re also using Split processors, you can > sometime run up against RAM and open file limits, if this is the case, make > sure you increase the ulimit -n settings. > > Simon > > On 9 Apr 2016, at 16:51, prabhu Mahendran wrote: > > Hi, > > I am new to nifi and does not know how to process large data like one gb csv > data into hbase.while try combination of getFile and putHbase shell leads > Java Out of memory error and also try combination of replace text, extract > text and puthbasejson doesn't work on large dataset but it work correctly in > smaller dataset. > Can anyone please help me to solve this? > Thanks in advance. > > Thanks & Regards, > Prabhu Mahendran > > >>> >>> >> >
Re: Large dataset on hbase
Is the output of your Pig script a single file that contains all the JSON documents corresponding to your CSV? or does it create a single JSON document for each row of the CSV? Also, are there any errors in logs/nifi-app.log (or on the processor in the UI) when this happens? -Bryan On Tue, Apr 12, 2016 at 12:38 PM, prabhu Mahendran wrote: > Hi, > > I just use Pig Script to convert the CSV into JSON with help of > ExecuteProcess. > > In my case i have use n1 from JSON document which could be stored as row > key in HBase Table.So n2-n22 store as columns in hbase. > > some of rows (n1's) are stored inside the table but remaining are read > well but not stored. > > Thanks, > Prabhu Mahendran > > On Tue, Apr 12, 2016 at 1:58 PM, Bryan Bende wrote: > >> Hi Prabhu, >> >> How did you end up converting your CSV into JSON? >> >> PutHBaseJSON creates a single row from a JSON document. In your example >> above, using n1 as the rowId, it would create a row with columns n2 - n22. >> Are you seeing columns missing, or are you missing whole rows from your >> original CSV? >> >> Thanks, >> >> Bryan >> >> >> >> On Mon, Apr 11, 2016 at 11:43 AM, prabhu Mahendran < >> prabhuu161...@gmail.com> wrote: >> >>> Hi Simon/Joe, >>> >>> Thanks for this support. >>> I have successfully converted the CSV data into JSON and also insert >>> those JSON data into Hbase Table using PutHBaseJSon. >>> Part of JSON Sample Data like below: >>> >>> { >>> "n1":"", >>> "n2":"", >>> "n3":"", >>> "n4":"","n5":"","n6":"", >>> "n7":"", >>> "n8":"", >>> "n9":"", >>> >>> "n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"", >>> >>> "n17":"","n18":"","n19":"","n20":"","n21":"-", >>> "n22":"" >>> >>> } >>> PutHBaseJSON: >>>Table Name is 'Hike' , Column Family:'Sweet' ,Row >>> Identifier Field Name:n1(Element in JSON File). >>> >>> My Record Contains 15 lacks rows but HBaseTable contains only 10 rows. >>> It Can Read the 15 lacks rows but stores minimum rows. >>> >>> Anyone please help me to solve this? >>> >>> >>> >>> >>> Prabhu, >>> >>> If the dataset being processed can be split up and still retain the >>> necessary meaning when input to HBase I'd recommend doing that. NiFI >>> itself, as a framework, can handle very large objects because its API >>> doesn't force loading of entire objects into memory. However, various >>> processors may do that and I believe ReplaceText may be one that does. >>> You can use SplitText or ExecuteScript or other processors to do that >>> splitting if that will help your case. >>> >>> Thanks >>> Joe >>> >>> On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball >>> wrote: >>> > Hi Prabhu, >>> > >>> > Did you try increasing the heap size in conf/bootstrap.conf? By >>> default nifi >>> > uses a very small RAM allocation (512MB). You can increase this by >>> tweaking >>> > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the >>> java >>> > heap, so you will need more than your data size to account for java >>> object >>> > overhead. The other thing to check is the buffer sizes you are using >>> for >>> > your replace text processors. If you’re also using Split processors, >>> you can >>> > sometime run up against RAM and open file limits, if this is the case, >>> make >>> > sure you increase the ulimit -n settings. >>> > >>> > Simon >>> > >>> > On 9 Apr 2016, at 16:51, prabhu Mahendran >>> wrote: >>> > >>> > Hi, >>> > >>> > I am new to nifi and does not know how to process large data like one >>> gb csv >>> > data into hbase.while try combination of getFile and putHbase shell >>> leads >>> > Java Out of memory error and also try combination of replace text, >>> extract >>> > text and puthbasejson doesn't work on large dataset but it work >>> correctly in >>> > smaller dataset. >>> > Can anyone please help me to solve this? >>> > Thanks in advance. >>> > >>> > Thanks & Regards, >>> > Prabhu Mahendran >>> > >>> > >>> >> >> >
Re: Large dataset on hbase
Hi, I just use Pig Script to convert the CSV into JSON with help of ExecuteProcess. In my case i have use n1 from JSON document which could be stored as row key in HBase Table.So n2-n22 store as columns in hbase. some of rows (n1's) are stored inside the table but remaining are read well but not stored. Thanks, Prabhu Mahendran On Tue, Apr 12, 2016 at 1:58 PM, Bryan Bende wrote: > Hi Prabhu, > > How did you end up converting your CSV into JSON? > > PutHBaseJSON creates a single row from a JSON document. In your example > above, using n1 as the rowId, it would create a row with columns n2 - n22. > Are you seeing columns missing, or are you missing whole rows from your > original CSV? > > Thanks, > > Bryan > > > > On Mon, Apr 11, 2016 at 11:43 AM, prabhu Mahendran < > prabhuu161...@gmail.com> wrote: > >> Hi Simon/Joe, >> >> Thanks for this support. >> I have successfully converted the CSV data into JSON and also insert >> those JSON data into Hbase Table using PutHBaseJSon. >> Part of JSON Sample Data like below: >> >> { >> "n1":"", >> "n2":"", >> "n3":"", >> "n4":"","n5":"","n6":"", >> "n7":"", >> "n8":"", >> "n9":"", >> >> "n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"", >> >> "n17":"","n18":"","n19":"","n20":"","n21":"-", >> "n22":"" >> >> } >> PutHBaseJSON: >>Table Name is 'Hike' , Column Family:'Sweet' ,Row >> Identifier Field Name:n1(Element in JSON File). >> >> My Record Contains 15 lacks rows but HBaseTable contains only 10 rows. >> It Can Read the 15 lacks rows but stores minimum rows. >> >> Anyone please help me to solve this? >> >> >> >> >> Prabhu, >> >> If the dataset being processed can be split up and still retain the >> necessary meaning when input to HBase I'd recommend doing that. NiFI >> itself, as a framework, can handle very large objects because its API >> doesn't force loading of entire objects into memory. However, various >> processors may do that and I believe ReplaceText may be one that does. >> You can use SplitText or ExecuteScript or other processors to do that >> splitting if that will help your case. >> >> Thanks >> Joe >> >> On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball wrote: >> > Hi Prabhu, >> > >> > Did you try increasing the heap size in conf/bootstrap.conf? By default >> nifi >> > uses a very small RAM allocation (512MB). You can increase this by >> tweaking >> > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java >> > heap, so you will need more than your data size to account for java >> object >> > overhead. The other thing to check is the buffer sizes you are using for >> > your replace text processors. If you’re also using Split processors, >> you can >> > sometime run up against RAM and open file limits, if this is the case, >> make >> > sure you increase the ulimit -n settings. >> > >> > Simon >> > >> > On 9 Apr 2016, at 16:51, prabhu Mahendran >> wrote: >> > >> > Hi, >> > >> > I am new to nifi and does not know how to process large data like one >> gb csv >> > data into hbase.while try combination of getFile and putHbase shell >> leads >> > Java Out of memory error and also try combination of replace text, >> extract >> > text and puthbasejson doesn't work on large dataset but it work >> correctly in >> > smaller dataset. >> > Can anyone please help me to solve this? >> > Thanks in advance. >> > >> > Thanks & Regards, >> > Prabhu Mahendran >> > >> > >> > >
Re: Large dataset on hbase
Hi Prabhu, How did you end up converting your CSV into JSON? PutHBaseJSON creates a single row from a JSON document. In your example above, using n1 as the rowId, it would create a row with columns n2 - n22. Are you seeing columns missing, or are you missing whole rows from your original CSV? Thanks, Bryan On Mon, Apr 11, 2016 at 11:43 AM, prabhu Mahendran wrote: > Hi Simon/Joe, > > Thanks for this support. > I have successfully converted the CSV data into JSON and also insert those > JSON data into Hbase Table using PutHBaseJSon. > Part of JSON Sample Data like below: > > { > "n1":"", > "n2":"", > "n3":"", > "n4":"","n5":"","n6":"", > "n7":"", > "n8":"", > "n9":"", > > "n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"", > > "n17":"","n18":"","n19":"","n20":"","n21":"-", > "n22":"" > > } > PutHBaseJSON: >Table Name is 'Hike' , Column Family:'Sweet' ,Row > Identifier Field Name:n1(Element in JSON File). > > My Record Contains 15 lacks rows but HBaseTable contains only 10 rows. > It Can Read the 15 lacks rows but stores minimum rows. > > Anyone please help me to solve this? > > > > > Prabhu, > > If the dataset being processed can be split up and still retain the > necessary meaning when input to HBase I'd recommend doing that. NiFI > itself, as a framework, can handle very large objects because its API > doesn't force loading of entire objects into memory. However, various > processors may do that and I believe ReplaceText may be one that does. > You can use SplitText or ExecuteScript or other processors to do that > splitting if that will help your case. > > Thanks > Joe > > On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball wrote: > > Hi Prabhu, > > > > Did you try increasing the heap size in conf/bootstrap.conf? By default > nifi > > uses a very small RAM allocation (512MB). You can increase this by > tweaking > > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java > > heap, so you will need more than your data size to account for java > object > > overhead. The other thing to check is the buffer sizes you are using for > > your replace text processors. If you’re also using Split processors, you > can > > sometime run up against RAM and open file limits, if this is the case, > make > > sure you increase the ulimit -n settings. > > > > Simon > > > > On 9 Apr 2016, at 16:51, prabhu Mahendran > wrote: > > > > Hi, > > > > I am new to nifi and does not know how to process large data like one gb > csv > > data into hbase.while try combination of getFile and putHbase shell leads > > Java Out of memory error and also try combination of replace text, > extract > > text and puthbasejson doesn't work on large dataset but it work > correctly in > > smaller dataset. > > Can anyone please help me to solve this? > > Thanks in advance. > > > > Thanks & Regards, > > Prabhu Mahendran > > > > >
Re: Large dataset on hbase
Hi Simon/Joe, Thanks for this support. I have successfully converted the CSV data into JSON and also insert those JSON data into Hbase Table using PutHBaseJSon. Part of JSON Sample Data like below: { "n1":"", "n2":"", "n3":"", "n4":"","n5":"","n6":"", "n7":"", "n8":"", "n9":"", "n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"", "n17":"","n18":"","n19":"","n20":"","n21":"-", "n22":"" } PutHBaseJSON: Table Name is 'Hike' , Column Family:'Sweet' ,Row Identifier Field Name:n1(Element in JSON File). My Record Contains 15 lacks rows but HBaseTable contains only 10 rows. It Can Read the 15 lacks rows but stores minimum rows. Anyone please help me to solve this? Prabhu, If the dataset being processed can be split up and still retain the necessary meaning when input to HBase I'd recommend doing that. NiFI itself, as a framework, can handle very large objects because its API doesn't force loading of entire objects into memory. However, various processors may do that and I believe ReplaceText may be one that does. You can use SplitText or ExecuteScript or other processors to do that splitting if that will help your case. Thanks Joe On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball wrote: > Hi Prabhu, > > Did you try increasing the heap size in conf/bootstrap.conf? By default nifi > uses a very small RAM allocation (512MB). You can increase this by tweaking > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java > heap, so you will need more than your data size to account for java object > overhead. The other thing to check is the buffer sizes you are using for > your replace text processors. If you’re also using Split processors, you can > sometime run up against RAM and open file limits, if this is the case, make > sure you increase the ulimit -n settings. > > Simon > > On 9 Apr 2016, at 16:51, prabhu Mahendran wrote: > > Hi, > > I am new to nifi and does not know how to process large data like one gb csv > data into hbase.while try combination of getFile and putHbase shell leads > Java Out of memory error and also try combination of replace text, extract > text and puthbasejson doesn't work on large dataset but it work correctly in > smaller dataset. > Can anyone please help me to solve this? > Thanks in advance. > > Thanks & Regards, > Prabhu Mahendran > >
Re: Large dataset on hbase
Prabhu, If the dataset being processed can be split up and still retain the necessary meaning when input to HBase I'd recommend doing that. NiFI itself, as a framework, can handle very large objects because its API doesn't force loading of entire objects into memory. However, various processors may do that and I believe ReplaceText may be one that does. You can use SplitText or ExecuteScript or other processors to do that splitting if that will help your case. Thanks Joe On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball wrote: > Hi Prabhu, > > Did you try increasing the heap size in conf/bootstrap.conf? By default nifi > uses a very small RAM allocation (512MB). You can increase this by tweaking > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java > heap, so you will need more than your data size to account for java object > overhead. The other thing to check is the buffer sizes you are using for > your replace text processors. If you’re also using Split processors, you can > sometime run up against RAM and open file limits, if this is the case, make > sure you increase the ulimit -n settings. > > Simon > > On 9 Apr 2016, at 16:51, prabhu Mahendran wrote: > > Hi, > > I am new to nifi and does not know how to process large data like one gb csv > data into hbase.while try combination of getFile and putHbase shell leads > Java Out of memory error and also try combination of replace text, extract > text and puthbasejson doesn't work on large dataset but it work correctly in > smaller dataset. > Can anyone please help me to solve this? > Thanks in advance. > > Thanks & Regards, > Prabhu Mahendran > >
Re: Large dataset on hbase
Hi Prabhu, Did you try increasing the heap size in conf/bootstrap.conf? By default nifi uses a very small RAM allocation (512MB). You can increase this by tweaking java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java heap, so you will need more than your data size to account for java object overhead. The other thing to check is the buffer sizes you are using for your replace text processors. If you’re also using Split processors, you can sometime run up against RAM and open file limits, if this is the case, make sure you increase the ulimit -n settings. Simon On 9 Apr 2016, at 16:51, prabhu Mahendran mailto:prabhuu161...@gmail.com>> wrote: Hi, I am new to nifi and does not know how to process large data like one gb csv data into hbase.while try combination of getFile and putHbase shell leads Java Out of memory error and also try combination of replace text, extract text and puthbasejson doesn't work on large dataset but it work correctly in smaller dataset. Can anyone please help me to solve this? Thanks in advance. Thanks & Regards, Prabhu Mahendran
Large dataset on hbase
Hi, I am new to nifi and does not know how to process large data like one gb csv data into hbase.while try combination of getFile and putHbase shell leads Java Out of memory error and also try combination of replace text, extract text and puthbasejson doesn't work on large dataset but it work correctly in smaller dataset. Can anyone please help me to solve this? Thanks in advance. Thanks & Regards, Prabhu Mahendran