deleteOnExit when JVM shutsdown non gracefully

2013-04-10 Thread Asaf Mesika
Hi, In the CoprocessorHost.java file, there's the following code section used to load a coprocessor jar: fs.copyToLocalFile(path, dst); File tmpLocal = new File(dst.toString()); tmpLocal.deleteOnExit(); There's an assumption here that the JVM will gracefully shutdown (as

Re: deleteOnExit when JVM shutsdown non gracefully

2013-04-10 Thread Ted Yu
Interesting. File a JIRA ? Thanks On Apr 10, 2013, at 2:30 AM, Asaf Mesika asaf.mes...@gmail.com wrote: Hi, In the CoprocessorHost.java file, there's the following code section used to load a coprocessor jar: fs.copyToLocalFile(path, dst); File tmpLocal = new

MapReduce: Reducers partitions.

2013-04-10 Thread Jean-Marc Spaggiari
Hi, quick question. How are the data from the map tasks partitionned for the reducers? If there is 1 reducer, it's easy, but if there is more, are all they same keys garanteed to end on the same reducer? Or not necessary? If they are not, how can we provide a partionning function? Thanks, JM

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Nitin Pawar
I hope i understood what you are asking is this . If not then pardon me :) from the hadoop developer handbook few lines The*Partitioner* class determines which partition a given (key, value) pair will go to. The default partitioner computes a hash value for the key and assigns the partition based

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Jean-Marc Spaggiari
Hi Nitin, You got my question correctly. However, I'm wondering how it's working when it's done into HBase. Do we have defaults partionners so we have the same garantee that records mapping to one key go to the same reducer. Or do we have to implement this one our own. JM 2013/4/10 Nitin Pawar

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Ted Yu
Jean-Marc: Take a look at HRegionPartitioner which is in both mapred and mapreduce packages: * This is used to partition the output keys into groups of keys. * Keys are grouped according to the regions that currently exist * so that each reducer fills a single region so load is distributed.

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Nitin Pawar
To add what Ted said, the same discussion happened on the question Jean asked https://issues.apache.org/jira/browse/HBASE-1287 On Wed, Apr 10, 2013 at 7:28 PM, Ted Yu yuzhih...@gmail.com wrote: Jean-Marc: Take a look at HRegionPartitioner which is in both mapred and mapreduce packages:

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Jean-Marc Spaggiari
Thanks Ted. It's exactly where I was looking at now. I was close. I will take a deeper look. Thanks Nitin for the link. I will read that too. JM 2013/4/10 Nitin Pawar nitinpawar...@gmail.com To add what Ted said, the same discussion happened on the question Jean asked

Comment on HBASE-6782

2013-04-10 Thread Pablo Musa
Hey guys, I have one comment to do over the issue [1]. Where is the best place to post it? My Comment: The solution in the issue corrects the invalid values, but insert quotes around the key as:

Re: Comment on HBASE-6782

2013-04-10 Thread Jean-Marc Spaggiari
Hi Pablo, The best way is to comment directly on the JIRA you are talking about. If there is no reactions, you can drop an email on the distribution list, but the JIRA will be the best place to start. JM 2013/4/10 Pablo Musa pa...@psafe.com Hey guys, I have one comment to do over the issue

Re: Comment on HBASE-6782

2013-04-10 Thread Ted Yu
Pablo: HBASE-6782 https://issues.apache.org/jira/browse/HBASE-6782 has been resolved. You can open a new one. Cheers On Wed, Apr 10, 2013 at 7:16 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Pablo, The best way is to comment directly on the JIRA you are talking about. If

class cast exception and setting operation timeout on a pooled HTable

2013-04-10 Thread Jim the Standing Bear
Hi, When I use HTablePool to perform some HBase data loading operations, I encountered a problem where the Put operation seemed to hang forever. A little bit of digging shows that the default client operation timeout is something like 2 billion ms. HTable provides a getter and setter methods on

Re: class cast exception and setting operation timeout on a pooled HTable

2013-04-10 Thread Ted Yu
PooledHTable implements HTableInterface through delegate, table. I see this method: * Expose the wrapped HTable to tests in the same package * * @return wrapped htable */ HTableInterface getWrappedTable() { return table; } If you just want to verify

Re: class cast exception and setting operation timeout on a pooled HTable

2013-04-10 Thread Ted Yu
bq. there is no way to directly set the operation timeout on a pooled table Right. Cheers On Wed, Apr 10, 2013 at 8:26 AM, Jim the Standing Bear standingb...@gmail.com wrote: Thanks Ted. It appears the implementation has changed from v0.92 to v0.94 (PooledHTable used to extend HTable in

Re: class cast exception and setting operation timeout on a pooled HTable

2013-04-10 Thread Nicolas Liochon
But don't forget you don't have to use pooled tables anymore. You can create the tables you need on the fly, see 9.3.1.1. Connection Pooling. IIRC, it's available in the version you're using (but I haven't checked). Cheers, Nicolas On Wed, Apr 10, 2013 at 5:26 PM, Jim the Standing Bear

Re: class cast exception and setting operation timeout on a pooled HTable

2013-04-10 Thread Jim the Standing Bear
Thanks Ted and Nicolas. It is good to know about the more controllable way to create connection pooling. I will work that into the next version of our code. -- Jim On Wed, Apr 10, 2013 at 12:09 PM, Nicolas Liochon nkey...@gmail.com wrote: But don't forget you don't have to use pooled tables

What your PMC sent to the apache board as our quarterly report

2013-04-10 Thread Stack
Every quarter the HBase PMC has to make a report to the Apache Board. Here is what we sent for this period (a minor, private item has been redacted). Yours, St.Ack HBase is a distributed column-oriented database built on top of Hadoop Common and Hadoop HDFS ISSUES FOR THE BOARD’s ATTENTION

Re: deleteOnExit when JVM shutsdown non gracefully

2013-04-10 Thread Andrew Purtell
Perhaps we can simply unlink the file after load. On *nix the OS would GC the file data after the JVM process terminates and the filehandles are closed. Of course this won't work on Windows. (But I don't care about that.) We added a change such that now all coprocessor jars are brought locally to

Re: Fixing badly distributed table manually.

2013-04-10 Thread Vincent Barat
Hi, Sorry for not responding: I'm not on the list very often. It seems to be of interest for some of you, so we will publish this script on GitHub, so that everybody can test and improve it. More info latter... Regards, Le 24/12/12 21:23, anil gupta a écrit : Hi Vincent, I dont know

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Jean-Marc Spaggiari
So. I looked at the code, and I have one comment/suggestion here. If the table we are outputing to has regions, then partitions are build around that, and that's fine. But if the table is totally empty with a single region, even if we setNumReduceTasks to 2 or more, all the keys will go on the

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Graeme Wallace
Whats the behavior then if you return hash % num_reducers and you have no splits defined. When the reducer writes to the table does the region server local to the reducer create a new region ? Graeme On Wed, Apr 10, 2013 at 1:26 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: So. I

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Jean-Marc Spaggiari
Hi Greame, No. The reducer will simply write on the table the same way you are doing a regular Put. If a split is required because of the size, then the region will be split, but at the end, there will not necessary be any region split. In the usecase described below, all the 600 lines will

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Graeme Wallace
Ok. Thanks. On Wed, Apr 10, 2013 at 2:01 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Greame, No. The reducer will simply write on the table the same way you are doing a regular Put. If a split is required because of the size, then the region will be split, but at the end,

composite query on hbase and rcfile

2013-04-10 Thread ur lops
Hi, Does anyone know, if hive can run a composite query over RCFILE and HBASE in the same query? Quick anwer will be highly appreciated Thanks in advance. Rob

Hbase Stargate Returns Scrambled Values

2013-04-10 Thread Ameya Kantikar
When I run some simple query on stargate it returns scarmbled values. eg.: curl -H Accept: application/json http://localhost:9001/t2/*/cf1

Re: Essential column family performance

2013-04-10 Thread lars hofhansl
Fix is committed and will be in 0.94.7. I guess we should have a discussion at some point on whether we should always switch this feature on (it is disabled by default), as we now can no longer find any case where enabling it is slower. -- Lars From: Anoop

Re: Essential column family performance

2013-04-10 Thread Ted Yu
Once 0.94.7 is released and more users try this feature out, we surely can consider turning it on (in 0.94.8) Cheers On Wed, Apr 10, 2013 at 4:02 PM, lars hofhansl la...@apache.org wrote: Fix is committed and will be in 0.94.7. I guess we should have a discussion at some point on whether we

Re: Essential column family performance

2013-04-10 Thread Stack
Turn it on by default in trunk/0.95 I'd say. St.Ack On Wed, Apr 10, 2013 at 4:02 PM, lars hofhansl la...@apache.org wrote: Fix is committed and will be in 0.94.7. I guess we should have a discussion at some point on whether we should always switch this feature on (it is disabled by

Re: Hbase Stargate Returns Scrambled Values

2013-04-10 Thread Andrew Purtell
Ask for binary results, i.e. use an Accept header of Accept: application/octet-stream. This has limitations though, only one result can be returned, so that wildcard query you provide as an example won't work. You'll have to fully specify the path to a cell. Otherwise, for JSON and XML

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Ted Yu
bq. I think it will be better to return something like keycrc%numPartitions Can you explain how keycrc is obtained ? I think if we change this logic, we should make it serve (relatively) more general use case. But I didn't find, in hadoop 1.0, how Partitioner can accept parameters: $ find .

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Stack
On Wed, Apr 10, 2013 at 6:54 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Nitin, You got my question correctly. However, I'm wondering how it's working when it's done into HBase. We use the default MapReduce partitioner:

Re: MapReduce: Reducers partitions.

2013-04-10 Thread Stack
On Wed, Apr 10, 2013 at 12:01 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Greame, No. The reducer will simply write on the table the same way you are doing a regular Put. If a split is required because of the size, then the region will be split, but at the end, there will not

Re: composite query on hbase and rcfile

2013-04-10 Thread Azuryy Yu
what's mean a composite query? Hive's query doesn't depends on file format, it can be ran on text file, sequence file, rcfile etc. On Thu, Apr 11, 2013 at 6:14 AM, ur lops urlop...@gmail.com wrote: Hi, Does anyone know, if hive can run a composite query over RCFILE and HBASE in the

RE: composite query on hbase and rcfile

2013-04-10 Thread Liu, Raymond
I guess rob mean that use one query to query rcfile and HBASE table at the same time. If your query is on two table, one upon rcfile, another upon HBASE through hbase storage handler, I think that should be ok. Best Regards, Raymond Liu what's mean a composite query? Hive's query doesn't

Re: composite query on hbase and rcfile

2013-04-10 Thread ur lops
Hi Raymond and Azuryy, I appreciate the response. Raymond answered my question and this is what I was looking for. Best Rob On Wed, Apr 10, 2013 at 7:08 PM, Liu, Raymond raymond@intel.com wrote: I guess rob mean that use one query to query rcfile and