RE: org.apache.hadoop.hbase.TableNotFoundException

2013-04-16 Thread Omkar Joshi
Hi Ted, There was a space after address(now feeling like a jackass :( ). I have another issue but will post in a new thread. Thanks a lot for the help ! Regards, Omkar Joshi -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, April 16, 2013 11:08 AM To:

Data not loaded in table via ImportTSV

2013-04-16 Thread Omkar Joshi
Hi, The background thread is this : http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3ce689a42b73c5a545ad77332a4fc75d8c1efbd80...@vshinmsmbx01.vshodc.lntinfotech.com%3E I'm referring to the HBase doc. http://hbase.apache.org/book/ops_mgt.html#importtsv Accordingly, my command

RE: Data not loaded in table via ImportTSV

2013-04-16 Thread Anoop Sam John
Hi Have you used the tool, LoadIncrementalHFiles after the ImportTSV? -Anoop- From: Omkar Joshi [omkar.jo...@lntinfotech.com] Sent: Tuesday, April 16, 2013 12:01 PM To: user@hbase.apache.org Subject: Data not loaded in table via ImportTSV

Re: org.apache.hadoop.hbase.TableNotFoundException

2013-04-16 Thread Andrea Gazzarini
I see a space between ADDRESS and the comma, shouldn't be a problem but...who knows? CUSTOMER_INFO:ADDRESS ,CUSTOMER_INFO:MOBILE Seems that the the unknown column name in the log message includes the space and the comma, is that right? Cannot find row in .META. for table:*

RE: 答复: HBase random read performance

2013-04-16 Thread Liu, Raymond
So what is lacking here? The action should also been parallel inside RS for each region, Instead of just parallel on RS level? Seems this will be rather difficult to implement, and for Get, might not be worthy? I looked at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

Re: 答复: HBase random read performance

2013-04-16 Thread Nicolas Liochon
I think there is something in the middle that could be done. It was discussed here a while ago, but without any JIRA created. See thread: http://mail-archives.apache.org/mod_mbox/hbase-user/201302.mbox/%3CCAKxWWm19OC+dePTK60bMmcecv=7tc+3t4-bq6fdqeppix_e...@mail.gmail.com%3E If someone can spend

Re: 答复: HBase random read performance

2013-04-16 Thread Jean-Marc Spaggiari
Hi Nicolas, I think it might be good to create a JIRA for that anyway since seems that some users are expecting this behaviour. My 2¢ ;) JM 2013/4/16 Nicolas Liochon nkey...@gmail.com I think there is something in the middle that could be done. It was discussed here a while ago, but without

How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread yun peng
Hi, All, I'd like to add a global timestamp oracle on Zookeep to assign globally unique timestamp for each Put/Get issued from HBase cluster. The reason I put it on Zookeeper is that each Put/Get needs to go through it and unique timestamp needs some global centralised facility to do it. But I am

Re: How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread Jean-Marc Spaggiari
Hi Yun, If I understand you correctly, that mean that each time our are going to do a put or a get you will need to call ZK first? Since ZK has only one master active, that mean that this ZK master will be called for each HBase get/put? You are going to create a bottle neck there. I don't know

RE: How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread Bijieshan
Yes, Jean-Marc Spaggiari is right. Performance is the big problem of this approach, though zookeeper can help you implement this. Regards, Jieshan -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Tuesday, April 16, 2013 8:20 PM To:

Re: How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread Ted Yu
Have you looked at https://github.com/yahoo/omid/wiki ? The Status Oracle implementation may give you some clue. Cheers On Apr 16, 2013, at 5:14 AM, yun peng pengyunm...@gmail.com wrote: Hi, All, I'd like to add a global timestamp oracle on Zookeep to assign globally unique timestamp for

Re: How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread yun peng
Hi, Jean and Jieshan, Are you saying client can directly contact region servers? Maybe I overlooked, but I think the client may need lookup regions by first contacting Zk as in figure 8-11 from definitive book(as attached)... Nevertheless, if it is the case, to assign a global timestamp, what is

Re: How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread Jean-Marc Spaggiari
Hi Yun, Attachements are not working on the mailing list. However, everyone using HBase should have the book on its desk, so I have ;) On the figure 8-11, you can see that client wil contact ZK to know where the root region is. Then the root region to find the meta, and so on. BUT This will

Re: schema design: rows vs wide columns

2013-04-16 Thread Ted Yu
If there is no objection, I will create a JIRA to increase the maximum number of column families described here: http://hbase.apache.org/book.html#number.of.cfs Cheers On Mon, Apr 8, 2013 at 7:21 AM, Doug Meil doug.m...@explorysmedical.comwrote: For the record, the refGuide mentions

Re: schema design: rows vs wide columns

2013-04-16 Thread Jean-Marc Spaggiari
Can we add more details than just changing the maximum CF number? Maybe we can explain why there is some impacts, or what to consider? JM 2013/4/16 Ted Yu yuzhih...@gmail.com If there is no objection, I will create a JIRA to increase the maximum number of column families described here:

Re: schema design: rows vs wide columns

2013-04-16 Thread Ted Yu
bq. Maybe we can explain why there is some impacts, or what to consider? The above would be covered in the JIRA. Thanks On Tue, Apr 16, 2013 at 7:04 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Can we add more details than just changing the maximum CF number? Maybe we can explain

Re: schema design: rows vs wide columns

2013-04-16 Thread Michael Segel
I think the important thing about Column Families is trying to understand on how to use them properly in a design. Sparse data may make sense. It depends on the use case and an understanding of the trade offs. It all depends on how the data breaks down in to specific use cases. Keeping CFs

Re: 答复: HBase random read performance

2013-04-16 Thread lars hofhansl
This fundamentally different, though. A scanner by default scans all regions serially, because it promises to return all rows in sort order. A multi get is already parallelized across regions (and hence accross region servers). Before we do a lot of work here we should fist make sure that

How to remove all traces of a dropped table.

2013-04-16 Thread David Koch
Hello, We had problems with not being able to scan over a large (~8k regions) table so we disabled and dropped it and decided to re-import data from scratch into a table with the SAME name. This never worked and I list some log extracts below. The only way to make the import go through was to

Re: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again

2013-04-16 Thread rajesh babu chintaguntla
Hi Dylan, $HBASE_HOME/bin/hbase hbck -fix the above command can bring back the cluster to normal state. Just if master restarted while ROOT/META region in closing(during move or balance) then the problem you have reported will come easily. Thanks for detailed logs. Raised an issue for this.

Renaming .snapshot directory used by HBase snapshots

2013-04-16 Thread Ted Yu
Hi, In the first half of this email, let me summarize our findings: Yesterday afternoon Huned ad I discovered an issue while playing with HBase Snapshots on top of Hadoop's Snapshot branch ( http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2802/). HDFS (built from HDFS-2802 branch)

RE: Data not loaded in table via ImportTSV

2013-04-16 Thread Omkar Joshi
Hi Anoop, Actually, I got confused after reading the doc. - I thought a simple importtsv command(which also takes table name as the argument) would suffice. But as you pointed out, completebulkload is required. HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar

how to evaluate the up-limit number of Regions?

2013-04-16 Thread Bing Jiang
hi,all I want to know whether it it a criterio or bible to measure the capicity of hbase cluster. From my views, it depends on: 1. hdfs volumn 2. system memory setting 3. Network IO, etc However, with the increase of number of table and region, how to evaluate the ability of service is not enough,

Re: how to evaluate the up-limit number of Regions?

2013-04-16 Thread Ted Yu
I am not sure I understand your question completely. Were you asking the upper bound of number of regions, given certain hardware resources ? Can you outline your expectation for throughput / latency ? I guess answers you may get would vary, depending on type of application, etc. On Tue, Apr

Loading text files from local file system

2013-04-16 Thread Omkar Joshi
The background thread is here : http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3ce689a42b73c5a545ad77332a4fc75d8c1efbe84...@vshinmsmbx01.vshodc.lntinfotech.com%3E Following are the commands that I'm using to load files onto HBase : HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase

Re: Loading text files from local file system

2013-04-16 Thread Surendra , Manchikanti
Hi Joshi, You can use Flume + AsyncHbaseSink / HBasesink to move data from local file sytem to HBase. Thanks, Surendra M -- Surendra Manchikanti On Wed, Apr 17, 2013 at 10:01 AM, Omkar Joshi omkar.jo...@lntinfotech.comwrote: The background thread is here :