Re: Change column family

2015-05-27 Thread shweta.agrawal
Thanks for all the suggestion. I read about TransformingIterator and started implementing it, I extended this class and tried to override its abstract method. But I am not able to get where and what to write to change column family? So please provide your suggestions. Thanks Shweta On

Re: Question about best practices on column names

2015-05-27 Thread Andrew Wells
On the surface it adds an additional level of specification/grouping. The potential benefit we have in accumulo is that along with the fact that identical rowID's are guaranteed to be in the same file. You can use Locality Groups, to place specific Column Families into the same file as well.

Re: Change column family

2015-05-27 Thread Andrew Wells
to implement that iterator. looks like you will only need to override replaceColumnFamily and this looks to return the new ColumnFamily via the argument. So manipulate the Text object provided. On Wed, May 27, 2015 at 8:06 AM, Andrew Wells awe...@clearedgeit.com wrote: Looks like you want to

Question about best practices on column names

2015-05-27 Thread David Patterson
I've been trying to understand the difference between the two column name parts -- column family and column qualifier. I don't understand the value of using the columnFamily for the column name and an empty text (new Text(new byte[0])) field for the column qualifier vs. a non-unique column name

Re: Accumulo Cluster Sizing

2015-05-27 Thread Fagan, Michael
Eric, Thanks. I assume managing something like 280GB per tablet server is feasible given the various knobs available to tune performance. Regards, Mike Fagan From: Eric Newton eric.new...@gmail.commailto:eric.new...@gmail.com Reply-To: user@accumulo.apache.orgmailto:user@accumulo.apache.org

Re: Accumulo Cluster Sizing

2015-05-27 Thread Fagan, Michael
Thanks everyone for their input. I estimate I can use 20 tablet servers to support 1m lookups a day Are there any good rules of thumb regarding the amount of data/tablets managed by a tablet server? Regards, Mike Fagan On 5/22/15, 1:33 PM, Kepner, Jeremy - 0553 - MITLL kep...@ll.mit.edu

Re: Question about best practices on column names

2015-05-27 Thread David Patterson
Thank you to all responders. This clears it up greatly. Dave P On Wed, May 27, 2015 at 10:52 AM, Christopher ctubb...@apache.org wrote: David- Both the column family (CF) and column qualifier (CQ) could be thought of as arbitrary dimensions in the key. If you only need one dimension to

Re: Question about best practices on column names

2015-05-27 Thread Christopher
David- Both the column family (CF) and column qualifier (CQ) could be thought of as arbitrary dimensions in the key. If you only need one dimension to specify your data, the other can be empty. You could also store these in separate tables, as you suggest, but part of the power of Accumulo is

Re: Accumulo Cluster Sizing

2015-05-27 Thread Eric Newton
You can get decent ingest concurrency when the number of tablets per server is between 20 and 80. There are so many knobs to adjust this performance, it's hard to give a simple answer. 0-1 tablets/server is bad. 1000+/server is bad. Usually. It will take time to tune your system. On Wed, May

Re: Question about best practices on column names

2015-05-27 Thread Josh Elser
Couple of clarifications: * Identical rowIDs will colocate data in the same tablet, but not necessarily the same file. Tablets can have multiple files. * Locality groups will colocate data within a file, not necessarily in its own file. RFile's format support multiple regions within the file

Re: Change column family

2015-05-27 Thread Josh Elser
I believe the typical case would be to set it at the scan and major compaction scopes for the table. This would ensure that queries for data would see the transformed result and, eventually, all of the data would be rewritten to the new schema (or you could force a major compaction and know

Re: Change column family

2015-05-27 Thread madhvi
Hi All, If anyone has worked on tranforming iterator can tell me if the iterator make tranformed changes in the accumulo table also or it returns the result at the scan time only. Can u provide me details how to implement its abstract methods and their use and workflow of the iterator?