Usually, when I run a mapreduce job both on Spark and Hadoop I just put *-site.xml files into the war I submit to the cluster and that's it. I think the problem appeared when I made the HTable a private transient field and the table istantiation was moved in the configure method. Could it be a valid reason? we still have to make a deeper debug but I'm trying ro figure out where to investigate.. On Nov 12, 2014 8:03 PM, "Robert Metzger" <rmetz...@apache.org> wrote:
> Hi, > Maybe its an issue with the classpath? As far as I know is Hadoop reading > the configuration files from the classpath. Maybe is the hbase-site.xml > file not accessible through the classpath when running on the cluster? > > On Wed, Nov 12, 2014 at 7:40 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > > > Today we tried tp execute a job on the cluster instead of on local > executor > > and we faced the problem that the hbase-site.xml was basically ignored. > Is > > there a reason why the TableInputFormat is working correctly on local > > environment while it doesn't on a cluster? > > On Nov 10, 2014 10:56 AM, "Fabian Hueske" <fhue...@apache.org> wrote: > > > > > I don't think we need to bundle the HBase input and output format in a > > > single PR. > > > So, I think we can proceed with the IF only and target the OF later. > > > However, the fix for Kryo should be in the master before merging the > PR. > > > Till is currently working on that and said he expects this to be done > by > > > end of the week. > > > > > > Cheers, Fabian > > > > > > > > > 2014-11-07 12:49 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > > > > > > > I fixed also the profile for Cloudera CDH5.1.3. You can build it with > > the > > > > command: > > > > mvn clean install -Dmaven.test.skip=true -Dhadoop.profile=2 > > > > -Pvendor-repos,cdh5.1.3 > > > > > > > > However, it would be good to generate the specific jar when > > > > releasing..(e.g. > > > > flink-addons:flink-hbase:0.8.0-hadoop2-cdh5.1.3-incubating) > > > > > > > > Best, > > > > Flavio > > > > > > > > On Fri, Nov 7, 2014 at 12:44 PM, Flavio Pompermaier < > > > pomperma...@okkam.it> > > > > wrote: > > > > > > > > > I've just updated the code on my fork (synch with current master > and > > > > > applied improvements coming from comments on related PR). > > > > > I still have to understand how to write results back to an HBase > > > > > Sink/OutputFormat... > > > > > > > > > > > > > > > On Mon, Nov 3, 2014 at 12:05 PM, Flavio Pompermaier < > > > > pomperma...@okkam.it> > > > > > wrote: > > > > > > > > > >> Thanks for the detailed answer. So if I run a job from my machine > > I'll > > > > >> have to download all the scanned data in a table..right? > > > > >> > > > > >> Always regarding the GenericTableOutputFormat it is not clear to > me > > > how > > > > >> to proceed.. > > > > >> I saw in the hadoop compatibility addon that it is possible to > have > > > such > > > > >> compatibility using HBaseUtils class so the open method should > > become > > > > >> something like: > > > > >> > > > > >> @Override > > > > >> public void open(int taskNumber, int numTasks) throws IOException > { > > > > >> if (Integer.toString(taskNumber + 1).length() > 6) { > > > > >> throw new IOException("Task id too large."); > > > > >> } > > > > >> TaskAttemptID taskAttemptID = > > TaskAttemptID.forName("attempt__0000_r_" > > > > >> + String.format("%" + (6 - Integer.toString(taskNumber + > > 1).length()) > > > + > > > > >> "s"," ").replace(" ", "0") > > > > >> + Integer.toString(taskNumber + 1) > > > > >> + "_0"); > > > > >> this.configuration.set("mapred.task.id", > > taskAttemptID.toString()); > > > > >> this.configuration.setInt("mapred.task.partition", taskNumber + > 1); > > > > >> // for hadoop 2.2 > > > > >> this.configuration.set("mapreduce.task.attempt.id", > > > > >> taskAttemptID.toString()); > > > > >> this.configuration.setInt("mapreduce.task.partition", taskNumber + > > 1); > > > > >> try { > > > > >> this.context = > > > > >> HadoopUtils.instantiateTaskAttemptContext(this.configuration, > > > > >> taskAttemptID); > > > > >> } catch (Exception e) { > > > > >> throw new RuntimeException(e); > > > > >> } > > > > >> final HFileOutputFormat2 outFormat = new HFileOutputFormat2(); > > > > >> try { > > > > >> this.writer = outFormat.getRecordWriter(this.context); > > > > >> } catch (InterruptedException iex) { > > > > >> throw new IOException("Opening the writer was interrupted.", iex); > > > > >> } > > > > >> } > > > > >> > > > > >> But I'm not sure about how to pass the JobConf to the class, if to > > > merge > > > > >> config fileas, where HFileOutputFormat2 writes the data and how to > > > > >> implement the public void writeRecord(Record record) API. > > > > >> Could I do a little chat off the mailing list with the implementor > > of > > > > >> this extension? > > > > >> > > > > >> On Mon, Nov 3, 2014 at 11:51 AM, Fabian Hueske < > fhue...@apache.org> > > > > >> wrote: > > > > >> > > > > >>> Hi Flavio > > > > >>> > > > > >>> let me try to answer your last question on the user's list (to > the > > > best > > > > >>> of > > > > >>> my HBase knowledge). > > > > >>> "I just wanted to known if and how regiom splitting is handled. > Can > > > you > > > > >>> explain me in detail how Flink and HBase works?what is not fully > > > clear > > > > to > > > > >>> me is when computation is done by region servers and when data > > start > > > > flow > > > > >>> to a Flink worker (that in ky test job is only my pc) and how ro > > > > >>> undertsand > > > > >>> better the important logged info to understand if my job is > > > performing > > > > >>> well" > > > > >>> > > > > >>> HBase partitions its tables into so called "regions" of keys and > > > stores > > > > >>> the > > > > >>> regions distributed in the cluster using HDFS. I think an HBase > > > region > > > > >>> can > > > > >>> be thought of as a HDFS block. To make reading an HBase table > > > > efficient, > > > > >>> region reads should be locally done, i.e., an InputFormat should > > > > >>> primarily > > > > >>> read region that are stored on the same machine as the IF is > > running > > > > on. > > > > >>> Flink's InputSplits partition the HBase input by regions and add > > > > >>> information about the storage location of the region. During > > > execution, > > > > >>> input splits are assigned to InputFormats that can do local > reads. > > > > >>> > > > > >>> Best, Fabian > > > > >>> > > > > >>> 2014-11-03 11:13 GMT+01:00 Stephan Ewen <se...@apache.org>: > > > > >>> > > > > >>> > Hi! > > > > >>> > > > > > >>> > The way of passing parameters through the configuration is very > > old > > > > >>> (the > > > > >>> > original HBase format dated back to that time). I would simply > > make > > > > the > > > > >>> > HBase format take those parameters through the constructor. > > > > >>> > > > > > >>> > Greetings, > > > > >>> > Stephan > > > > >>> > > > > > >>> > > > > > >>> > On Mon, Nov 3, 2014 at 10:59 AM, Flavio Pompermaier < > > > > >>> pomperma...@okkam.it> > > > > >>> > wrote: > > > > >>> > > > > > >>> > > The problem is that I also removed the > GenericTableOutputFormat > > > > >>> because > > > > >>> > > there is an incompatibility between hadoop1 and hadoop2 for > > class > > > > >>> > > TaskAttemptContext and TaskAttemptContextImpl.. > > > > >>> > > then it would be nice if the user doesn't have to worry about > > > > passing > > > > >>> > > pact.hbase.jtkey and pact.job.id parameters.. > > > > >>> > > I think it is probably a good idea to remove hadoop1 > > > compatibility > > > > >>> and > > > > >>> > keep > > > > >>> > > enable HBase addon only for hadoop2 (as before) and decide > how > > to > > > > >>> mange > > > > >>> > > those 2 parameters.. > > > > >>> > > > > > > >>> > > On Mon, Nov 3, 2014 at 10:19 AM, Stephan Ewen < > > se...@apache.org> > > > > >>> wrote: > > > > >>> > > > > > > >>> > > > It is fine to remove it, in my opinion. > > > > >>> > > > > > > > >>> > > > On Mon, Nov 3, 2014 at 10:11 AM, Flavio Pompermaier < > > > > >>> > > pomperma...@okkam.it> > > > > >>> > > > wrote: > > > > >>> > > > > > > > >>> > > > > That is one class I removed because it was using the > > > deprecated > > > > >>> API > > > > >>> > > > > GenericDataSink..I can restore them but the it will be a > > good > > > > >>> idea to > > > > >>> > > > > remove those warning (also because from what I understood > > the > > > > >>> Record > > > > >>> > > APIs > > > > >>> > > > > are going to be removed). > > > > >>> > > > > > > > > >>> > > > > On Mon, Nov 3, 2014 at 9:51 AM, Fabian Hueske < > > > > >>> fhue...@apache.org> > > > > >>> > > > wrote: > > > > >>> > > > > > > > > >>> > > > > > I'm not familiar with the HBase connector code, but are > > you > > > > >>> maybe > > > > >>> > > > looking > > > > >>> > > > > > for the GenericTableOutputFormat? > > > > >>> > > > > > > > > > >>> > > > > > 2014-11-03 9:44 GMT+01:00 Flavio Pompermaier < > > > > >>> pomperma...@okkam.it > > > > >>> > >: > > > > >>> > > > > > > > > > >>> > > > > > > | was trying to modify the example setting > > > > hbaseDs.output(new > > > > >>> > > > > > > HBaseOutputFormat()); but I can't see any > > > HBaseOutputFormat > > > > >>> > > > > class..maybe > > > > >>> > > > > > we > > > > >>> > > > > > > shall use another class? > > > > >>> > > > > > > > > > > >>> > > > > > > On Mon, Nov 3, 2014 at 9:39 AM, Flavio Pompermaier < > > > > >>> > > > > pomperma...@okkam.it > > > > >>> > > > > > > > > > > >>> > > > > > > wrote: > > > > >>> > > > > > > > > > > >>> > > > > > > > Maybe that's something I could add to the HBase > > example > > > > and > > > > >>> > that > > > > >>> > > > > could > > > > >>> > > > > > be > > > > >>> > > > > > > > better documented in the Wiki. > > > > >>> > > > > > > > > > > > >>> > > > > > > > Since we're talking about the wiki..I was looking > at > > > the > > > > >>> Java > > > > >>> > > API ( > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > > > > > > http://flink.incubator.apache.org/docs/0.6-incubating/java_api_guide.html > > > > >>> ) > > > > >>> > > > > > > > and the link to the KMeans example is not working > > > (where > > > > it > > > > >>> > says > > > > >>> > > > For > > > > >>> > > > > a > > > > >>> > > > > > > > complete example program, have a look at KMeans > > > > Algorithm). > > > > >>> > > > > > > > > > > > >>> > > > > > > > Best, > > > > >>> > > > > > > > Flavio > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > > >>> > > > > > > > On Mon, Nov 3, 2014 at 9:12 AM, Flavio Pompermaier > < > > > > >>> > > > > > pomperma...@okkam.it > > > > >>> > > > > > > > > > > > >>> > > > > > > > wrote: > > > > >>> > > > > > > > > > > > >>> > > > > > > >> Ah ok, perfect! That was the reason why I removed > it > > > :) > > > > >>> > > > > > > >> > > > > >>> > > > > > > >> On Mon, Nov 3, 2014 at 9:10 AM, Stephan Ewen < > > > > >>> > se...@apache.org> > > > > >>> > > > > > wrote: > > > > >>> > > > > > > >> > > > > >>> > > > > > > >>> You do not really need a HBase data sink. You can > > > call > > > > >>> > > > > > > >>> "DataSet.output(new > > > > >>> > > > > > > >>> HBaseOutputFormat())" > > > > >>> > > > > > > >>> > > > > >>> > > > > > > >>> Stephan > > > > >>> > > > > > > >>> Am 02.11.2014 23:05 schrieb "Flavio Pompermaier" > < > > > > >>> > > > > > pomperma...@okkam.it > > > > >>> > > > > > > >: > > > > >>> > > > > > > >>> > > > > >>> > > > > > > >>> > Just one last thing..I removed the > HbaseDataSink > > > > >>> because I > > > > >>> > > > think > > > > >>> > > > > it > > > > >>> > > > > > > was > > > > >>> > > > > > > >>> > using the old APIs..can someone help me in > > updating > > > > >>> that > > > > >>> > > class? > > > > >>> > > > > > > >>> > > > > > >>> > > > > > > >>> > On Sun, Nov 2, 2014 at 10:55 AM, Flavio > > > Pompermaier < > > > > >>> > > > > > > >>> pomperma...@okkam.it> > > > > >>> > > > > > > >>> > wrote: > > > > >>> > > > > > > >>> > > > > > >>> > > > > > > >>> > > Indeed this time the build has been > successful > > :) > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > On Sun, Nov 2, 2014 at 10:29 AM, Fabian > Hueske > > < > > > > >>> > > > > > fhue...@apache.org > > > > >>> > > > > > > > > > > > >>> > > > > > > >>> > wrote: > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > >> You can also setup Travis to build your own > > > Github > > > > >>> > > > > repositories > > > > >>> > > > > > by > > > > >>> > > > > > > >>> > linking > > > > >>> > > > > > > >>> > >> it to your Github account. That way Travis > can > > > > >>> build all > > > > >>> > > > your > > > > >>> > > > > > > >>> branches > > > > >>> > > > > > > >>> > >> (and > > > > >>> > > > > > > >>> > >> you can also trigger rebuilds if something > > > fails). > > > > >>> > > > > > > >>> > >> Not sure if we can manually trigger > retrigger > > > > >>> builds on > > > > >>> > > the > > > > >>> > > > > > Apache > > > > >>> > > > > > > >>> > >> repository. > > > > >>> > > > > > > >>> > >> > > > > >>> > > > > > > >>> > >> Support for Hadoop 1 and 2 is indeed a very > > good > > > > >>> > addition > > > > >>> > > > :-) > > > > >>> > > > > > > >>> > >> > > > > >>> > > > > > > >>> > >> For the discusion about the PR itself, I > would > > > > need > > > > >>> a > > > > >>> > bit > > > > >>> > > > more > > > > >>> > > > > > > time > > > > >>> > > > > > > >>> to > > > > >>> > > > > > > >>> > >> become more familiar with HBase. I do also > not > > > > have > > > > >>> a > > > > >>> > > HBase > > > > >>> > > > > > setup > > > > >>> > > > > > > >>> > >> available > > > > >>> > > > > > > >>> > >> here. > > > > >>> > > > > > > >>> > >> Maybe somebody else of the community who was > > > > >>> involved > > > > >>> > > with a > > > > >>> > > > > > > >>> previous > > > > >>> > > > > > > >>> > >> version of the HBase connector could comment > > on > > > > your > > > > >>> > > > question. > > > > >>> > > > > > > >>> > >> > > > > >>> > > > > > > >>> > >> Best, Fabian > > > > >>> > > > > > > >>> > >> > > > > >>> > > > > > > >>> > >> 2014-11-02 9:57 GMT+01:00 Flavio > Pompermaier < > > > > >>> > > > > > > pomperma...@okkam.it > > > > >>> > > > > > > >>> >: > > > > >>> > > > > > > >>> > >> > > > > >>> > > > > > > >>> > >> > As suggestes by Fabian I moved the > > discussion > > > on > > > > >>> this > > > > >>> > > > > mailing > > > > >>> > > > > > > >>> list. > > > > >>> > > > > > > >>> > >> > > > > > >>> > > > > > > >>> > >> > I think that what is still to be discussed > > is > > > > >>> how to > > > > >>> > > > > > retrigger > > > > >>> > > > > > > >>> the > > > > >>> > > > > > > >>> > >> build > > > > >>> > > > > > > >>> > >> > on Travis (I don't have an account) and if > > the > > > > PR > > > > >>> can > > > > >>> > be > > > > >>> > > > > > > >>> integrated. > > > > >>> > > > > > > >>> > >> > > > > > >>> > > > > > > >>> > >> > Maybe what I can do is to move the HBase > > > example > > > > >>> in > > > > >>> > the > > > > >>> > > > test > > > > >>> > > > > > > >>> package > > > > >>> > > > > > > >>> > >> (right > > > > >>> > > > > > > >>> > >> > now I left it in the main folder) so it > will > > > > force > > > > >>> > > Travis > > > > >>> > > > to > > > > >>> > > > > > > >>> rebuild. > > > > >>> > > > > > > >>> > >> > I'll do it within a couple of hours. > > > > >>> > > > > > > >>> > >> > > > > > >>> > > > > > > >>> > >> > Another thing I forgot to say is that the > > > hbase > > > > >>> > > extension > > > > >>> > > > is > > > > >>> > > > > > now > > > > >>> > > > > > > >>> > >> compatible > > > > >>> > > > > > > >>> > >> > with both hadoop 1 and 2. > > > > >>> > > > > > > >>> > >> > > > > > >>> > > > > > > >>> > >> > Best, > > > > >>> > > > > > > >>> > >> > Flavio > > > > >>> > > > > > > >>> > >> > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > > >>> > > > > >>> > > > > > > >> > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > > >