outputCollector vs. Localfile

2011-05-19 Thread Mark question
This is puzzling me ...

  With a mapper producing output of size ~ 400 MB ... which one is supposed
to be faster?

 1) output collector: which will write to local file then copy to HDFS since
I don't have reducers.

  2) Open a unique local file inside "mapred.local.dir" for each mapper.

   I thought of (2), but (1) was actually faster ... can someone explains ?

 Thanks,
Mark


Re: Linker errors with Hadoop pipes

2011-05-19 Thread Tom Melendez
I'm on Ubuntu and use pipes.  These are my ssl packages, notice libssl
and libssl-dev in particular:

supertom@hadoop-2:~/h-v8$ dpkg -l |grep -i ssl
ii  libopenssl-ruby 4.2
OpenSSL interface for Ruby
ii  libopenssl-ruby1.8  1.8.7.249-2
OpenSSL interface for Ruby 1.8
ii  libssl-dev  0.9.8k-7ubuntu8.6
SSL development libraries, header files and
ii  libssl0.9.8 0.9.8k-7ubuntu8.6
SSL shared libraries
ii  openssl 0.9.8k-7ubuntu8
Secure Socket Layer (SSL) binary and related
ii  python-openssl  0.10-1
Python wrapper around the OpenSSL library
ii  ssl-cert1.0.23ubuntu2
simple debconf wrapper for OpenSSL

Hope that helps,

Thanks,

Tom

On Thu, May 19, 2011 at 3:28 PM, tdp2110  wrote:
>
> n00b here, just started playing around with pipes. I'm getting linker errors
> while compiling a simple WordCount example using hadoop-0.20.203 (current
> most recent version) that did not appear for the same code in hadoop-0.20.2
>
> Linker errors of the form: undefined reference to `EVP_sha1' in
> HadoopPipes.cc.
>
> EVP_sha1 (and all of the undefined references I get) are part of the openssl
> library which HadoopPipes.cc from hadoop-0.20.203 uses, but hadoop-0.20.2
> does not.
>
> I've tried adjusting my makefile to link to the ssl libraries, but I'm still
> out of luck. Any ideas would be greatly appreciated. Thanks!
>
> PS, here is my current makefile:
> CC = g++
> HADOOP_INSTALL = /usr/local/hadoop-0.20.203.0
> SSL_INSTALL = /usr/local/ssl
> PLATFORM = Linux-amd64-64
> CPPFLAGS = -m64 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include
> -I$(SSL_INSTALL)/include
>
> WordCount: WordCount.cc
>    $(CC) $(CPPFLAGS) $< -Wall -Wextra -L$(SSL_INSTALL)/lib -lssl -lcrypto
> -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes -lhadooputils
> -lpthread -g -O2 -o $@
>
> --
> View this message in context: 
> http://old.nabble.com/Linker-errors-with-Hadoop-pipes-tp31634596p31634596.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


Next SF HUG: June 8, at RichRelevance

2011-05-19 Thread Aaron Kimball
We had a great time after the Cloudera hackathon last week with our monthly
SF Hadoop User Group meetup. I'd like to thank Cloudera again for hosting
such a successful event.

Our next meetup will be held Wednesday, June 8, from 6pm to 8pm.

This meetup will be hosted by our friends at RichRelevance. Their office is
at 275 Battery St. #1150, though we'll be using their 9th floor conference
space.

As usual, we will use the discussion-based "unconference" format. At the
beginning of the meetup we will collaboratively construct an agenda
consisting of several discussion breakout groups. All participants may
propose a topic and volunteer to facilitate a discussion. All Hadoop-related
topics are encouraged, and all members of the Hadoop community are welcome.

Event schedule:

   - 6pm - Welcome
   - 6:30pm - Introductions; start creating agenda
   - Breakout sessions begin as soon as we're ready
   - 8pm - Conclusion


Food and refreshments will be provided, courtesy of RichRelevance.

If you're going to attend, please RSVP at http://bit.ly/kxaJqa.

Hope to see you all there!
- Aaron Kimball


Re: some guidance needed

2011-05-19 Thread Robert Burrell Donkin
On Thu, May 19, 2011 at 12:04 PM, Ioan Eugen Stan  wrote:
> I have forwarded this discussion to my mentors so they are informed

(I've hopped onto this list so no need to remember to copy me into the
thread ;-)



> Eric, one of my mentors, suggested I use Gora for
> this and after a quick look at Gora I saw that it is an ORM for HBase
> and Cassandra which will allow me switch between them. The downside
> with this is that Gora is still incubating so a piece of advice about
> using it or not is welcomed. I will also ask on the Gora mailing list
> to see how things are there.

(I suspect there will be a measure of experimentation required in this
project, so don't be afraid to try a spike or two)

>>> I would encourage you to look at a system like HBase for your mail
>>> backend. HDFS doesn't work well with lots of little files, and also
>>> doesn't support random update, so existing formats like Maildir
>>> wouldn't be a good fit.

(Apache James closer to the Microsoft Exchange space than traditional
*nix mail user agents)

> I don't think I understand correctly what you mean by random updates.
> E-mails are immutable so once written they are not going to be
> updated. But if you are referring to the fact that lots of (small)
> files will be written in a directory and that this can be a problem
> then I get it. This will also mean that mailbox format (all emails in
> one file) will be more inappropriate than Maildir. But since e-mails
> are immutable and adding a mail to the mailbox means appending a small
> piece of data to the file this should not be a problem if Hadoop has
> append.

Essentially, there are two classes of data that mail storage requires

1. read only MIME documents (mail messages) embedding meta-data (headers)
2. read-write meta-data sets about each document including flags for
each (virtual) mail directory containing the document

The documents are searched rarely. The meta-data sets are read often
but written rarely.

I suspect that emails are relatively small in Hadoop terms, and are
often numerous. Might be interesting to see how a tuned HDFS instance
performs when storing large numbers of small MIME documents. Should be
easy enough to set up an experiment to benchmark. (I wonder whether a
RESTful distributed storage solution might end up working better.)

I suspect that the read-write meta-data sets will need HBase (or
Cassandra). Would need to think carefully about design, I think.

> The presentation on Vimeo it stated that HDFS 0.19 did not had append,
> I don't know yet what is the status on that, but things are a little
> brighter. You could have a mailbox file that could grow to a very
> large size. This will lead to all the users emails into one big file
> that is easy to manage, the only thing that it's missing is the
> fetching the emails. Since emails are appended to the file (inbox) as
> they come, and you usually are interested in the latest emails
> received you could just read the tail of the file and do some indexing
> based on that.

I'm not hopeful about adopting an append based approach. (Might be
made to work but I suspect that the locking required for IMAP or POP3
is likely to kill performance.)

Robert


Re: Hadoop and WikiLeaks

2011-05-19 Thread Edward Capriolo
On Thu, May 19, 2011 at 11:54 AM, Ted Dunning  wrote:

> ZK started as sub-project of Hadoop.
>
> On Thu, May 19, 2011 at 7:27 AM, M. C. Srivas  wrote:
>
> > Interesting to note that Cassandra and ZK are now considered Hadoop
> > projects.
> >
> > There were independent of Hadoop before the recent update.
> >
> >
> > On Thu, May 19, 2011 at 4:18 AM, Steve Loughran 
> wrote:
> >
> > > On 18/05/11 18:05, javam...@cox.net wrote:
> > >
> > >> Yes!
> > >>
> > >> -Pete
> > >>
> > >>  Edward Capriolo  wrote:
> > >>
> > >> =
> > >> http://hadoop.apache.org/#What+Is+Apache%E2%84%A2+Hadoop%E2%84%A2%3F
> > >>
> > >> March 2011 - Apache Hadoop takes top prize at Media Guardian
> Innovation
> > >> Awards
> > >>
> > >> The Hadoop project won the "innovator of the year"award from the UK's
> > >> Guardian newspaper, where it was described as "had the potential as a
> > >> greater catalyst for innovation than other nominees including
> WikiLeaks
> > >> and
> > >> the iPad."
> > >>
> > >> Does this copy text bother anyone else? Sure winning any award is
> great
> > >> but
> > >> does hadoop want to be associated with "innovation" like WikiLeaks?
> > >>
> > >
> > >
> > > Ian updated the page yesterday with changes I'd put in for trademarks,
> > and
> > > I added this news quote directly from the paper. We could strip out the
> > > quote easily enough.
> > >
> > >
> >
>

Cassandra is not considered to be a hadoop project or sub-project. The site
mentions "Other Hadoop-related projects at Apache include". The relation is
that Cassandra has Input and Output formats and other support.


Re: Can I use InputSampler.RandomSampler on data with non-Text keys?

2011-05-19 Thread Joey Echeverria
Filing a bug is a great idea. InputSampler is in the MapReduce hadoop
sub-project which has it's own Jira project:

https://issues.apache.org/jira/browse/MAPREDUCE

-Joey

On Thu, May 19, 2011 at 9:28 AM, W.P. McNeill  wrote:
> Should I file a bug then?  Do I do that
> here
> ?
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: Can I use InputSampler.RandomSampler on data with non-Text keys?

2011-05-19 Thread W.P. McNeill
Should I file a bug then?  Do I do that
here
?


Re: Hadoop and WikiLeaks

2011-05-19 Thread Ted Dunning
ZK started as sub-project of Hadoop.

On Thu, May 19, 2011 at 7:27 AM, M. C. Srivas  wrote:

> Interesting to note that Cassandra and ZK are now considered Hadoop
> projects.
>
> There were independent of Hadoop before the recent update.
>
>
> On Thu, May 19, 2011 at 4:18 AM, Steve Loughran  wrote:
>
> > On 18/05/11 18:05, javam...@cox.net wrote:
> >
> >> Yes!
> >>
> >> -Pete
> >>
> >>  Edward Capriolo  wrote:
> >>
> >> =
> >> http://hadoop.apache.org/#What+Is+Apache%E2%84%A2+Hadoop%E2%84%A2%3F
> >>
> >> March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation
> >> Awards
> >>
> >> The Hadoop project won the "innovator of the year"award from the UK's
> >> Guardian newspaper, where it was described as "had the potential as a
> >> greater catalyst for innovation than other nominees including WikiLeaks
> >> and
> >> the iPad."
> >>
> >> Does this copy text bother anyone else? Sure winning any award is great
> >> but
> >> does hadoop want to be associated with "innovation" like WikiLeaks?
> >>
> >
> >
> > Ian updated the page yesterday with changes I'd put in for trademarks,
> and
> > I added this news quote directly from the paper. We could strip out the
> > quote easily enough.
> >
> >
>


Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Joey Echeverria
The hadoop script in 0.20.x probably has all of the features that the
hdfs script in 0.21.x has (plus the functionality of the mapred
script). Give this a try and see if you get any errors.

-Joey

On Thu, May 19, 2011 at 8:31 AM, Gabriele Kahlout
 wrote:
> that simple? No changes to hdfs-config.sh?
>
> What about all the other stuff in the hdfs?
> For example the script calls hdfs dfs , like that won't it crash?
>
> elif [ "$COMMAND" = "dfs" ] ; then
>  CLASS=org.apache.hadoop.fs.FsShell
>
> On Thu, May 19, 2011 at 5:26 PM, Joey Echeverria  wrote:
>
>> I would just write your own hdfs script that has the following:
>>
>> #!/bin/sh
>>
>> export HADOOP_HOME=/path/to/hadoop
>> exec ${HADOOP_HOME}/bin/hadoop "$@"
>>
>> -Joey
>>
>> On Thu, May 19, 2011 at 8:10 AM, Gabriele Kahlout
>>  wrote:
>> > because I've an immutable script written for hadoop that uses hdfs.
>> >
>> > On Thu, May 19, 2011 at 5:02 PM, Joey Echeverria 
>> wrote:
>> >
>> >> Why do you need the hdfs script? Typically 0.20.x is used with just the
>> >> hadoop script.
>> >>
>> >> -Joey
>> >> On May 19, 2011 8:00 AM, "Gabriele Kahlout" 
>> >> wrote:
>> >> > $ hadoop version
>> >> > Hadoop 0.20.3-SNAPSHOT
>> >> > Subversion
>> >> >
>> >>
>> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append-r
>> >> > 1041718
>> >> > Compiled by hammer on Mon Dec 6 17:38:16 CET 2010
>> >> >
>> >> >
>> >> > On Thu, May 19, 2011 at 4:55 PM, Joey Echeverria 
>> >> wrote:
>> >> >
>> >> >> What version of hadoop is installed?
>> >> >>
>> >> >> -Joey
>> >> >> On May 19, 2011 7:49 AM, "Gabriele Kahlout" <
>> gabri...@mysimpatico.com>
>> >> >> wrote:
>> >> >> > I said i don't have write access (~ the administrator with write
>> >> access
>> >> >> will
>> >> >> > not place the script for me).
>> >> >> >
>> >> >> > On Thu, May 19, 2011 at 3:56 PM, Niels Basjes 
>> >> wrote:
>> >> >> >
>> >> >> >> So why don't you ask for someone with write access to put the file
>> >> >> there?
>> >> >> >>
>> >> >> >> 2011/5/19 Gabriele Kahlout :
>> >> >> >> > so your question is, why do you have the problem in the first
>> >> place?
>> >> >> >> > because it's not in $HADOOP_HOME/bin (older hadoop) and I don't
>> >> have
>> >> >> >> wrtie
>> >> >> >> > access.
>> >> >> >> >
>> >> >> >> > On Thu, May 19, 2011 at 3:33 PM, Joey Echeverria <
>> >> j...@cloudera.com>
>> >> >> >> wrote:
>> >> >> >> >
>> >> >> >> >> Why do you need to move the script from $HADOOP_HOME/bin?
>> >> >> >> >>
>> >> >> >> >> Can't you just symlink it or write a script which runs the
>> >> original?
>> >> >> >> >>
>> >> >> >> >> -Joey
>> >> >> >> >>
>> >> >> >> >> On May 19, 2011, at 4:15, Gabriele Kahlout <
>> >> gabri...@mysimpatico.com
>> >> >> >
>> >> >> >> >> wrote:
>> >> >> >> >>
>> >> >> >> >> > I'm still having the following problem, any suggestions?
>> >> >> >> >> >
>> >> >> >> >> > I'm trying to modify the
>> >> >> >> >> > hdfs<
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>> http://svn.apache.org/viewvc/hadoop/hdfs/tags/release-0.21.0/bin/hdfs?view=markup
>> >> >> >> >> >script
>> >> >> >> >> > so that it still functions although not located in
>> >> $HADOOP_HOME/bin
>> >> >> >> >> > anymore, but when I execute the modified hdfs I get:
>> >> >> >> >> >
>> >> >> >> >> > hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found
>> >> >> >> >> >
>> >> >> >> >> > line 110 is:
>> >> >> >> >> >
>> >> >> >> >> > exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
>> >> >> >> >> >
>> >> >> >> >> > I've highlighted the changes I made to the script:
>> >> >> >> >> >
>> >> >> >> >> > bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin";
>> pwd
>> >> >> >> >> >
>> >> >> >> >> > ./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout <
>> >> >> >> >> gabri...@mysimpatico.com
>> >> >> >> >> >> wrote:
>> >> >> >> >> >
>> >> >> >> >> >> http://stackoverflow.com/q/6015818/300248
>> >> >> >> >> >>
>> >> >> >> >> >> --
>> >> >> >> >> >> Regards,
>> >> >> >> >> >> K. Gabriele
>> >> >> >> >> >>
>> >> >> >> >> >> --- unchanged since 20/9/10 ---
>> >> >> >> >> >> P.S. If the subject contains "[LON]" or the addressee
>> >> acknowledges
>> >> >> >> the
>> >> >> >> >> >> receipt within 48 hours then I don't resend the email.
>> >> >> >> >> >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
>> >> this)
>> >> >> ∧
>> >> >> >> >> >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>> >> >> >> >> >>
>> >> >> >> >> >> If an email is sent by a sender that is not a trusted
>> contact
>> >> or
>> >> >> the
>> >> >> >> >> email
>> >> >> >> >> >> does not contain a valid code then the email is not
>> received. A
>> >> >> valid
>> >> >> >> >> code
>> >> >> >> >> >> starts with a hyphen and ends with "X".
>> >> >> >> >> >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
>> >> subject(x)
>> >> >> ∧
>> >> >> >> y ∈
>> >> >> >> >> >> L(-[a-z]+[0-9]X)).
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> 

Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Gabriele Kahlout
that simple? No changes to hdfs-config.sh?

What about all the other stuff in the hdfs?
For example the script calls hdfs dfs , like that won't it crash?

elif [ "$COMMAND" = "dfs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell

On Thu, May 19, 2011 at 5:26 PM, Joey Echeverria  wrote:

> I would just write your own hdfs script that has the following:
>
> #!/bin/sh
>
> export HADOOP_HOME=/path/to/hadoop
> exec ${HADOOP_HOME}/bin/hadoop "$@"
>
> -Joey
>
> On Thu, May 19, 2011 at 8:10 AM, Gabriele Kahlout
>  wrote:
> > because I've an immutable script written for hadoop that uses hdfs.
> >
> > On Thu, May 19, 2011 at 5:02 PM, Joey Echeverria 
> wrote:
> >
> >> Why do you need the hdfs script? Typically 0.20.x is used with just the
> >> hadoop script.
> >>
> >> -Joey
> >> On May 19, 2011 8:00 AM, "Gabriele Kahlout" 
> >> wrote:
> >> > $ hadoop version
> >> > Hadoop 0.20.3-SNAPSHOT
> >> > Subversion
> >> >
> >>
> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append-r
> >> > 1041718
> >> > Compiled by hammer on Mon Dec 6 17:38:16 CET 2010
> >> >
> >> >
> >> > On Thu, May 19, 2011 at 4:55 PM, Joey Echeverria 
> >> wrote:
> >> >
> >> >> What version of hadoop is installed?
> >> >>
> >> >> -Joey
> >> >> On May 19, 2011 7:49 AM, "Gabriele Kahlout" <
> gabri...@mysimpatico.com>
> >> >> wrote:
> >> >> > I said i don't have write access (~ the administrator with write
> >> access
> >> >> will
> >> >> > not place the script for me).
> >> >> >
> >> >> > On Thu, May 19, 2011 at 3:56 PM, Niels Basjes 
> >> wrote:
> >> >> >
> >> >> >> So why don't you ask for someone with write access to put the file
> >> >> there?
> >> >> >>
> >> >> >> 2011/5/19 Gabriele Kahlout :
> >> >> >> > so your question is, why do you have the problem in the first
> >> place?
> >> >> >> > because it's not in $HADOOP_HOME/bin (older hadoop) and I don't
> >> have
> >> >> >> wrtie
> >> >> >> > access.
> >> >> >> >
> >> >> >> > On Thu, May 19, 2011 at 3:33 PM, Joey Echeverria <
> >> j...@cloudera.com>
> >> >> >> wrote:
> >> >> >> >
> >> >> >> >> Why do you need to move the script from $HADOOP_HOME/bin?
> >> >> >> >>
> >> >> >> >> Can't you just symlink it or write a script which runs the
> >> original?
> >> >> >> >>
> >> >> >> >> -Joey
> >> >> >> >>
> >> >> >> >> On May 19, 2011, at 4:15, Gabriele Kahlout <
> >> gabri...@mysimpatico.com
> >> >> >
> >> >> >> >> wrote:
> >> >> >> >>
> >> >> >> >> > I'm still having the following problem, any suggestions?
> >> >> >> >> >
> >> >> >> >> > I'm trying to modify the
> >> >> >> >> > hdfs<
> >> >> >> >>
> >> >> >>
> >> >>
> >> >>
> >>
> >>
> http://svn.apache.org/viewvc/hadoop/hdfs/tags/release-0.21.0/bin/hdfs?view=markup
> >> >> >> >> >script
> >> >> >> >> > so that it still functions although not located in
> >> $HADOOP_HOME/bin
> >> >> >> >> > anymore, but when I execute the modified hdfs I get:
> >> >> >> >> >
> >> >> >> >> > hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found
> >> >> >> >> >
> >> >> >> >> > line 110 is:
> >> >> >> >> >
> >> >> >> >> > exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
> >> >> >> >> >
> >> >> >> >> > I've highlighted the changes I made to the script:
> >> >> >> >> >
> >> >> >> >> > bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin";
> pwd
> >> >> >> >> >
> >> >> >> >> > ./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout <
> >> >> >> >> gabri...@mysimpatico.com
> >> >> >> >> >> wrote:
> >> >> >> >> >
> >> >> >> >> >> http://stackoverflow.com/q/6015818/300248
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >> Regards,
> >> >> >> >> >> K. Gabriele
> >> >> >> >> >>
> >> >> >> >> >> --- unchanged since 20/9/10 ---
> >> >> >> >> >> P.S. If the subject contains "[LON]" or the addressee
> >> acknowledges
> >> >> >> the
> >> >> >> >> >> receipt within 48 hours then I don't resend the email.
> >> >> >> >> >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
> >> this)
> >> >> ∧
> >> >> >> >> >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
> >> >> >> >> >>
> >> >> >> >> >> If an email is sent by a sender that is not a trusted
> contact
> >> or
> >> >> the
> >> >> >> >> email
> >> >> >> >> >> does not contain a valid code then the email is not
> received. A
> >> >> valid
> >> >> >> >> code
> >> >> >> >> >> starts with a hyphen and ends with "X".
> >> >> >> >> >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
> >> subject(x)
> >> >> ∧
> >> >> >> y ∈
> >> >> >> >> >> L(-[a-z]+[0-9]X)).
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > --
> >> >> >> >> > Regards,
> >> >> >> >> > K. Gabriele
> >> >> >> >> >
> >> >> >> >> > --- unchanged since 20/9/10 ---
> >> >> >> >> > P.S. If the subject contains "[LON]" or the addressee
> >> acknowledges
> >> >> the
> >> >> >> >> > receipt within 48 hours then I don't resend the email.
> >> >> >> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
> >> this)

Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Joey Echeverria
I would just write your own hdfs script that has the following:

#!/bin/sh

export HADOOP_HOME=/path/to/hadoop
exec ${HADOOP_HOME}/bin/hadoop "$@"

-Joey

On Thu, May 19, 2011 at 8:10 AM, Gabriele Kahlout
 wrote:
> because I've an immutable script written for hadoop that uses hdfs.
>
> On Thu, May 19, 2011 at 5:02 PM, Joey Echeverria  wrote:
>
>> Why do you need the hdfs script? Typically 0.20.x is used with just the
>> hadoop script.
>>
>> -Joey
>> On May 19, 2011 8:00 AM, "Gabriele Kahlout" 
>> wrote:
>> > $ hadoop version
>> > Hadoop 0.20.3-SNAPSHOT
>> > Subversion
>> >
>> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append-r
>> > 1041718
>> > Compiled by hammer on Mon Dec 6 17:38:16 CET 2010
>> >
>> >
>> > On Thu, May 19, 2011 at 4:55 PM, Joey Echeverria 
>> wrote:
>> >
>> >> What version of hadoop is installed?
>> >>
>> >> -Joey
>> >> On May 19, 2011 7:49 AM, "Gabriele Kahlout" 
>> >> wrote:
>> >> > I said i don't have write access (~ the administrator with write
>> access
>> >> will
>> >> > not place the script for me).
>> >> >
>> >> > On Thu, May 19, 2011 at 3:56 PM, Niels Basjes 
>> wrote:
>> >> >
>> >> >> So why don't you ask for someone with write access to put the file
>> >> there?
>> >> >>
>> >> >> 2011/5/19 Gabriele Kahlout :
>> >> >> > so your question is, why do you have the problem in the first
>> place?
>> >> >> > because it's not in $HADOOP_HOME/bin (older hadoop) and I don't
>> have
>> >> >> wrtie
>> >> >> > access.
>> >> >> >
>> >> >> > On Thu, May 19, 2011 at 3:33 PM, Joey Echeverria <
>> j...@cloudera.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Why do you need to move the script from $HADOOP_HOME/bin?
>> >> >> >>
>> >> >> >> Can't you just symlink it or write a script which runs the
>> original?
>> >> >> >>
>> >> >> >> -Joey
>> >> >> >>
>> >> >> >> On May 19, 2011, at 4:15, Gabriele Kahlout <
>> gabri...@mysimpatico.com
>> >> >
>> >> >> >> wrote:
>> >> >> >>
>> >> >> >> > I'm still having the following problem, any suggestions?
>> >> >> >> >
>> >> >> >> > I'm trying to modify the
>> >> >> >> > hdfs<
>> >> >> >>
>> >> >>
>> >>
>> >>
>>
>> http://svn.apache.org/viewvc/hadoop/hdfs/tags/release-0.21.0/bin/hdfs?view=markup
>> >> >> >> >script
>> >> >> >> > so that it still functions although not located in
>> $HADOOP_HOME/bin
>> >> >> >> > anymore, but when I execute the modified hdfs I get:
>> >> >> >> >
>> >> >> >> > hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found
>> >> >> >> >
>> >> >> >> > line 110 is:
>> >> >> >> >
>> >> >> >> > exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
>> >> >> >> >
>> >> >> >> > I've highlighted the changes I made to the script:
>> >> >> >> >
>> >> >> >> > bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin"; pwd
>> >> >> >> >
>> >> >> >> > ./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout <
>> >> >> >> gabri...@mysimpatico.com
>> >> >> >> >> wrote:
>> >> >> >> >
>> >> >> >> >> http://stackoverflow.com/q/6015818/300248
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> Regards,
>> >> >> >> >> K. Gabriele
>> >> >> >> >>
>> >> >> >> >> --- unchanged since 20/9/10 ---
>> >> >> >> >> P.S. If the subject contains "[LON]" or the addressee
>> acknowledges
>> >> >> the
>> >> >> >> >> receipt within 48 hours then I don't resend the email.
>> >> >> >> >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
>> this)
>> >> ∧
>> >> >> >> >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>> >> >> >> >>
>> >> >> >> >> If an email is sent by a sender that is not a trusted contact
>> or
>> >> the
>> >> >> >> email
>> >> >> >> >> does not contain a valid code then the email is not received. A
>> >> valid
>> >> >> >> code
>> >> >> >> >> starts with a hyphen and ends with "X".
>> >> >> >> >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
>> subject(x)
>> >> ∧
>> >> >> y ∈
>> >> >> >> >> L(-[a-z]+[0-9]X)).
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > Regards,
>> >> >> >> > K. Gabriele
>> >> >> >> >
>> >> >> >> > --- unchanged since 20/9/10 ---
>> >> >> >> > P.S. If the subject contains "[LON]" or the addressee
>> acknowledges
>> >> the
>> >> >> >> > receipt within 48 hours then I don't resend the email.
>> >> >> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
>> this)
>> >> ∧
>> >> >> >> time(x)
>> >> >> >> > < Now + 48h) ⇒ ¬resend(I, this).
>> >> >> >> >
>> >> >> >> > If an email is sent by a sender that is not a trusted contact or
>> >> the
>> >> >> >> email
>> >> >> >> > does not contain a valid code then the email is not received. A
>> >> valid
>> >> >> >> code
>> >> >> >> > starts with a hyphen and ends with "X".
>> >> >> >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
>> subject(x)
>> >> ∧
>> >> y
>> >> >> ∈
>> >> >> >> > L(-[a-z]+[0-9]X)).
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Regards,
>> >> >> > K. Gabriele
>> >> >> >
>> >> 

Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Gabriele Kahlout
because I've an immutable script written for hadoop that uses hdfs.

On Thu, May 19, 2011 at 5:02 PM, Joey Echeverria  wrote:

> Why do you need the hdfs script? Typically 0.20.x is used with just the
> hadoop script.
>
> -Joey
> On May 19, 2011 8:00 AM, "Gabriele Kahlout" 
> wrote:
> > $ hadoop version
> > Hadoop 0.20.3-SNAPSHOT
> > Subversion
> >
> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append-r
> > 1041718
> > Compiled by hammer on Mon Dec 6 17:38:16 CET 2010
> >
> >
> > On Thu, May 19, 2011 at 4:55 PM, Joey Echeverria 
> wrote:
> >
> >> What version of hadoop is installed?
> >>
> >> -Joey
> >> On May 19, 2011 7:49 AM, "Gabriele Kahlout" 
> >> wrote:
> >> > I said i don't have write access (~ the administrator with write
> access
> >> will
> >> > not place the script for me).
> >> >
> >> > On Thu, May 19, 2011 at 3:56 PM, Niels Basjes 
> wrote:
> >> >
> >> >> So why don't you ask for someone with write access to put the file
> >> there?
> >> >>
> >> >> 2011/5/19 Gabriele Kahlout :
> >> >> > so your question is, why do you have the problem in the first
> place?
> >> >> > because it's not in $HADOOP_HOME/bin (older hadoop) and I don't
> have
> >> >> wrtie
> >> >> > access.
> >> >> >
> >> >> > On Thu, May 19, 2011 at 3:33 PM, Joey Echeverria <
> j...@cloudera.com>
> >> >> wrote:
> >> >> >
> >> >> >> Why do you need to move the script from $HADOOP_HOME/bin?
> >> >> >>
> >> >> >> Can't you just symlink it or write a script which runs the
> original?
> >> >> >>
> >> >> >> -Joey
> >> >> >>
> >> >> >> On May 19, 2011, at 4:15, Gabriele Kahlout <
> gabri...@mysimpatico.com
> >> >
> >> >> >> wrote:
> >> >> >>
> >> >> >> > I'm still having the following problem, any suggestions?
> >> >> >> >
> >> >> >> > I'm trying to modify the
> >> >> >> > hdfs<
> >> >> >>
> >> >>
> >>
> >>
>
> http://svn.apache.org/viewvc/hadoop/hdfs/tags/release-0.21.0/bin/hdfs?view=markup
> >> >> >> >script
> >> >> >> > so that it still functions although not located in
> $HADOOP_HOME/bin
> >> >> >> > anymore, but when I execute the modified hdfs I get:
> >> >> >> >
> >> >> >> > hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found
> >> >> >> >
> >> >> >> > line 110 is:
> >> >> >> >
> >> >> >> > exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
> >> >> >> >
> >> >> >> > I've highlighted the changes I made to the script:
> >> >> >> >
> >> >> >> > bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin"; pwd
> >> >> >> >
> >> >> >> > ./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh
> >> >> >> >
> >> >> >> >
> >> >> >> > On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout <
> >> >> >> gabri...@mysimpatico.com
> >> >> >> >> wrote:
> >> >> >> >
> >> >> >> >> http://stackoverflow.com/q/6015818/300248
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> Regards,
> >> >> >> >> K. Gabriele
> >> >> >> >>
> >> >> >> >> --- unchanged since 20/9/10 ---
> >> >> >> >> P.S. If the subject contains "[LON]" or the addressee
> acknowledges
> >> >> the
> >> >> >> >> receipt within 48 hours then I don't resend the email.
> >> >> >> >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
> this)
> >> ∧
> >> >> >> >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
> >> >> >> >>
> >> >> >> >> If an email is sent by a sender that is not a trusted contact
> or
> >> the
> >> >> >> email
> >> >> >> >> does not contain a valid code then the email is not received. A
> >> valid
> >> >> >> code
> >> >> >> >> starts with a hyphen and ends with "X".
> >> >> >> >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
> subject(x)
> >> ∧
> >> >> y ∈
> >> >> >> >> L(-[a-z]+[0-9]X)).
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Regards,
> >> >> >> > K. Gabriele
> >> >> >> >
> >> >> >> > --- unchanged since 20/9/10 ---
> >> >> >> > P.S. If the subject contains "[LON]" or the addressee
> acknowledges
> >> the
> >> >> >> > receipt within 48 hours then I don't resend the email.
> >> >> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
> this)
> >> ∧
> >> >> >> time(x)
> >> >> >> > < Now + 48h) ⇒ ¬resend(I, this).
> >> >> >> >
> >> >> >> > If an email is sent by a sender that is not a trusted contact or
> >> the
> >> >> >> email
> >> >> >> > does not contain a valid code then the email is not received. A
> >> valid
> >> >> >> code
> >> >> >> > starts with a hyphen and ends with "X".
> >> >> >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
> subject(x)
> >> ∧
> >> y
> >> >> ∈
> >> >> >> > L(-[a-z]+[0-9]X)).
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Regards,
> >> >> > K. Gabriele
> >> >> >
> >> >> > --- unchanged since 20/9/10 ---
> >> >> > P.S. If the subject contains "[LON]" or the addressee acknowledges
> the
> >> >> > receipt within 48 hours then I don't resend the email.
> >> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this)
> ∧
> >> >> time(x)
> >> >> > < Now + 48h) ⇒ ¬resend(I, this).
> >> >> >
> >> >> > If an email is sent by a sender th

Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Joey Echeverria
Why do you need the hdfs script? Typically 0.20.x is used with just the
hadoop script.

-Joey
On May 19, 2011 8:00 AM, "Gabriele Kahlout" 
wrote:
> $ hadoop version
> Hadoop 0.20.3-SNAPSHOT
> Subversion
> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append-r
> 1041718
> Compiled by hammer on Mon Dec 6 17:38:16 CET 2010
>
>
> On Thu, May 19, 2011 at 4:55 PM, Joey Echeverria 
wrote:
>
>> What version of hadoop is installed?
>>
>> -Joey
>> On May 19, 2011 7:49 AM, "Gabriele Kahlout" 
>> wrote:
>> > I said i don't have write access (~ the administrator with write access
>> will
>> > not place the script for me).
>> >
>> > On Thu, May 19, 2011 at 3:56 PM, Niels Basjes  wrote:
>> >
>> >> So why don't you ask for someone with write access to put the file
>> there?
>> >>
>> >> 2011/5/19 Gabriele Kahlout :
>> >> > so your question is, why do you have the problem in the first place?
>> >> > because it's not in $HADOOP_HOME/bin (older hadoop) and I don't have
>> >> wrtie
>> >> > access.
>> >> >
>> >> > On Thu, May 19, 2011 at 3:33 PM, Joey Echeverria 
>> >> wrote:
>> >> >
>> >> >> Why do you need to move the script from $HADOOP_HOME/bin?
>> >> >>
>> >> >> Can't you just symlink it or write a script which runs the
original?
>> >> >>
>> >> >> -Joey
>> >> >>
>> >> >> On May 19, 2011, at 4:15, Gabriele Kahlout <
gabri...@mysimpatico.com
>> >
>> >> >> wrote:
>> >> >>
>> >> >> > I'm still having the following problem, any suggestions?
>> >> >> >
>> >> >> > I'm trying to modify the
>> >> >> > hdfs<
>> >> >>
>> >>
>>
>>
http://svn.apache.org/viewvc/hadoop/hdfs/tags/release-0.21.0/bin/hdfs?view=markup
>> >> >> >script
>> >> >> > so that it still functions although not located in
$HADOOP_HOME/bin
>> >> >> > anymore, but when I execute the modified hdfs I get:
>> >> >> >
>> >> >> > hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found
>> >> >> >
>> >> >> > line 110 is:
>> >> >> >
>> >> >> > exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
>> >> >> >
>> >> >> > I've highlighted the changes I made to the script:
>> >> >> >
>> >> >> > bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin"; pwd
>> >> >> >
>> >> >> > ./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh
>> >> >> >
>> >> >> >
>> >> >> > On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout <
>> >> >> gabri...@mysimpatico.com
>> >> >> >> wrote:
>> >> >> >
>> >> >> >> http://stackoverflow.com/q/6015818/300248
>> >> >> >>
>> >> >> >> --
>> >> >> >> Regards,
>> >> >> >> K. Gabriele
>> >> >> >>
>> >> >> >> --- unchanged since 20/9/10 ---
>> >> >> >> P.S. If the subject contains "[LON]" or the addressee
acknowledges
>> >> the
>> >> >> >> receipt within 48 hours then I don't resend the email.
>> >> >> >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
this)
>> ∧
>> >> >> >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>> >> >> >>
>> >> >> >> If an email is sent by a sender that is not a trusted contact or
>> the
>> >> >> email
>> >> >> >> does not contain a valid code then the email is not received. A
>> valid
>> >> >> code
>> >> >> >> starts with a hyphen and ends with "X".
>> >> >> >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
subject(x)
>> ∧
>> >> y ∈
>> >> >> >> L(-[a-z]+[0-9]X)).
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Regards,
>> >> >> > K. Gabriele
>> >> >> >
>> >> >> > --- unchanged since 20/9/10 ---
>> >> >> > P.S. If the subject contains "[LON]" or the addressee
acknowledges
>> the
>> >> >> > receipt within 48 hours then I don't resend the email.
>> >> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
this)
>> ∧
>> >> >> time(x)
>> >> >> > < Now + 48h) ⇒ ¬resend(I, this).
>> >> >> >
>> >> >> > If an email is sent by a sender that is not a trusted contact or
>> the
>> >> >> email
>> >> >> > does not contain a valid code then the email is not received. A
>> valid
>> >> >> code
>> >> >> > starts with a hyphen and ends with "X".
>> >> >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
subject(x)
>> ∧
>> y
>> >> ∈
>> >> >> > L(-[a-z]+[0-9]X)).
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Regards,
>> >> > K. Gabriele
>> >> >
>> >> > --- unchanged since 20/9/10 ---
>> >> > P.S. If the subject contains "[LON]" or the addressee acknowledges
the
>> >> > receipt within 48 hours then I don't resend the email.
>> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> >> time(x)
>> >> > < Now + 48h) ⇒ ¬resend(I, this).
>> >> >
>> >> > If an email is sent by a sender that is not a trusted contact or the
>> >> email
>> >> > does not contain a valid code then the email is not received. A
valid
>> >> code
>> >> > starts with a hyphen and ends with "X".
>> >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧
y
>> ∈
>> >> > L(-[a-z]+[0-9]X)).
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Met vriendelijke groeten,
>> >>
>> >> Niels Basjes
>> >>
>> >
>> >
>> >
>> > --
>> > Regards,
>> > K. Gabriele
>> >
>> > --- unchanged since 20/9/10

Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Gabriele Kahlout
$ hadoop version
Hadoop 0.20.3-SNAPSHOT
Subversion
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append -r
1041718
Compiled by hammer on Mon Dec  6 17:38:16 CET 2010


On Thu, May 19, 2011 at 4:55 PM, Joey Echeverria  wrote:

> What version of hadoop is installed?
>
> -Joey
> On May 19, 2011 7:49 AM, "Gabriele Kahlout" 
> wrote:
> > I said i don't have write access (~ the administrator with write access
> will
> > not place the script for me).
> >
> > On Thu, May 19, 2011 at 3:56 PM, Niels Basjes  wrote:
> >
> >> So why don't you ask for someone with write access to put the file
> there?
> >>
> >> 2011/5/19 Gabriele Kahlout :
> >> > so your question is, why do you have the problem in the first place?
> >> > because it's not in $HADOOP_HOME/bin (older hadoop) and I don't have
> >> wrtie
> >> > access.
> >> >
> >> > On Thu, May 19, 2011 at 3:33 PM, Joey Echeverria 
> >> wrote:
> >> >
> >> >> Why do you need to move the script from $HADOOP_HOME/bin?
> >> >>
> >> >> Can't you just symlink it or write a script which runs the original?
> >> >>
> >> >> -Joey
> >> >>
> >> >> On May 19, 2011, at 4:15, Gabriele Kahlout  >
> >> >> wrote:
> >> >>
> >> >> > I'm still having the following problem, any suggestions?
> >> >> >
> >> >> > I'm trying to modify the
> >> >> > hdfs<
> >> >>
> >>
>
> http://svn.apache.org/viewvc/hadoop/hdfs/tags/release-0.21.0/bin/hdfs?view=markup
> >> >> >script
> >> >> > so that it still functions although not located in $HADOOP_HOME/bin
> >> >> > anymore, but when I execute the modified hdfs I get:
> >> >> >
> >> >> > hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found
> >> >> >
> >> >> > line 110 is:
> >> >> >
> >> >> > exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
> >> >> >
> >> >> > I've highlighted the changes I made to the script:
> >> >> >
> >> >> > bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin"; pwd
> >> >> >
> >> >> > ./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh
> >> >> >
> >> >> >
> >> >> > On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout <
> >> >> gabri...@mysimpatico.com
> >> >> >> wrote:
> >> >> >
> >> >> >> http://stackoverflow.com/q/6015818/300248
> >> >> >>
> >> >> >> --
> >> >> >> Regards,
> >> >> >> K. Gabriele
> >> >> >>
> >> >> >> --- unchanged since 20/9/10 ---
> >> >> >> P.S. If the subject contains "[LON]" or the addressee acknowledges
> >> the
> >> >> >> receipt within 48 hours then I don't resend the email.
> >> >> >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this)
> ∧
> >> >> >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
> >> >> >>
> >> >> >> If an email is sent by a sender that is not a trusted contact or
> the
> >> >> email
> >> >> >> does not contain a valid code then the email is not received. A
> valid
> >> >> code
> >> >> >> starts with a hyphen and ends with "X".
> >> >> >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x)
> ∧
> >> y ∈
> >> >> >> L(-[a-z]+[0-9]X)).
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Regards,
> >> >> > K. Gabriele
> >> >> >
> >> >> > --- unchanged since 20/9/10 ---
> >> >> > P.S. If the subject contains "[LON]" or the addressee acknowledges
> the
> >> >> > receipt within 48 hours then I don't resend the email.
> >> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this)
> ∧
> >> >> time(x)
> >> >> > < Now + 48h) ⇒ ¬resend(I, this).
> >> >> >
> >> >> > If an email is sent by a sender that is not a trusted contact or
> the
> >> >> email
> >> >> > does not contain a valid code then the email is not received. A
> valid
> >> >> code
> >> >> > starts with a hyphen and ends with "X".
> >> >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x)
> ∧
> y
> >> ∈
> >> >> > L(-[a-z]+[0-9]X)).
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Regards,
> >> > K. Gabriele
> >> >
> >> > --- unchanged since 20/9/10 ---
> >> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> >> > receipt within 48 hours then I don't resend the email.
> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> >> time(x)
> >> > < Now + 48h) ⇒ ¬resend(I, this).
> >> >
> >> > If an email is sent by a sender that is not a trusted contact or the
> >> email
> >> > does not contain a valid code then the email is not received. A valid
> >> code
> >> > starts with a hyphen and ends with "X".
> >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
> ∈
> >> > L(-[a-z]+[0-9]X)).
> >> >
> >>
> >>
> >>
> >> --
> >> Met vriendelijke groeten,
> >>
> >> Niels Basjes
> >>
> >
> >
> >
> > --
> > Regards,
> > K. Gabriele
> >
> > --- unchanged since 20/9/10 ---
> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > receipt within 48 hours then I don't resend the email.
> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> > < Now + 48h) ⇒ ¬resend(I, this).
> >
> > If an email is sent by a sender that is not a trusted contact or the
> email
>

Re: Applications creates bigger output than input?

2011-05-19 Thread Robert Evans
I'm not sure if this has been mentioned or not but in Machine Learning with 
text based documents, the first stage is often a glorified word count action.  
Except much of the time they will do N-Gram.  So

Map Input:
"Hello this is a test"

Map Output:
"Hello"
"This"
"is"
"a"
"test"
"Hello" "this"
"this" "is"
"is" "a"
"a" "test"
...


You may also be extracting all kinds of other features form the text, but the 
tokenization/n-gram is not that CPU intensive.

--Bobby Evans

On 5/19/11 3:06 AM, "elton sky"  wrote:

Hello,
I pick up this topic again, because what I am looking for is something not
CPU bound. Augmenting data for ETL and generating index are good examples.
Neither of them requires too much cpu time on map side. The main bottle neck
for them is shuffle and merge.

Market basket analysis is cpu intensive in map phase, for sampling all
possible combinations of items.

I am still looking for more applications, which creates bigger output and
not CPU bound.
Any further idea? I appreciate.


On Tue, May 3, 2011 at 3:10 AM, Steve Loughran  wrote:

> On 30/04/2011 05:31, elton sky wrote:
>
>> Thank you for suggestions:
>>
>> Weblog analysis, market basket analysis and generating search index.
>>
>> I guess for these applications we need more reduces than maps, for
>> handling
>> large intermediate output, isn't it. Besides, the input split for map
>> should
>> be smaller than usual,  because the workload for spill and merge on map's
>> local disk is heavy.
>>
>
> any form of rendering can generate very large images
>
> see: http://www.hpl.hp.com/techreports/2009/HPL-2009-345.pdf
>
>
>



Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Joey Echeverria
What version of hadoop is installed?

-Joey
On May 19, 2011 7:49 AM, "Gabriele Kahlout" 
wrote:
> I said i don't have write access (~ the administrator with write access
will
> not place the script for me).
>
> On Thu, May 19, 2011 at 3:56 PM, Niels Basjes  wrote:
>
>> So why don't you ask for someone with write access to put the file there?
>>
>> 2011/5/19 Gabriele Kahlout :
>> > so your question is, why do you have the problem in the first place?
>> > because it's not in $HADOOP_HOME/bin (older hadoop) and I don't have
>> wrtie
>> > access.
>> >
>> > On Thu, May 19, 2011 at 3:33 PM, Joey Echeverria 
>> wrote:
>> >
>> >> Why do you need to move the script from $HADOOP_HOME/bin?
>> >>
>> >> Can't you just symlink it or write a script which runs the original?
>> >>
>> >> -Joey
>> >>
>> >> On May 19, 2011, at 4:15, Gabriele Kahlout 
>> >> wrote:
>> >>
>> >> > I'm still having the following problem, any suggestions?
>> >> >
>> >> > I'm trying to modify the
>> >> > hdfs<
>> >>
>>
http://svn.apache.org/viewvc/hadoop/hdfs/tags/release-0.21.0/bin/hdfs?view=markup
>> >> >script
>> >> > so that it still functions although not located in $HADOOP_HOME/bin
>> >> > anymore, but when I execute the modified hdfs I get:
>> >> >
>> >> > hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found
>> >> >
>> >> > line 110 is:
>> >> >
>> >> > exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
>> >> >
>> >> > I've highlighted the changes I made to the script:
>> >> >
>> >> > bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin"; pwd
>> >> >
>> >> > ./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh
>> >> >
>> >> >
>> >> > On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout <
>> >> gabri...@mysimpatico.com
>> >> >> wrote:
>> >> >
>> >> >> http://stackoverflow.com/q/6015818/300248
>> >> >>
>> >> >> --
>> >> >> Regards,
>> >> >> K. Gabriele
>> >> >>
>> >> >> --- unchanged since 20/9/10 ---
>> >> >> P.S. If the subject contains "[LON]" or the addressee acknowledges
>> the
>> >> >> receipt within 48 hours then I don't resend the email.
>> >> >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this)
∧
>> >> >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>> >> >>
>> >> >> If an email is sent by a sender that is not a trusted contact or
the
>> >> email
>> >> >> does not contain a valid code then the email is not received. A
valid
>> >> code
>> >> >> starts with a hyphen and ends with "X".
>> >> >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x)
∧
>> y ∈
>> >> >> L(-[a-z]+[0-9]X)).
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > Regards,
>> >> > K. Gabriele
>> >> >
>> >> > --- unchanged since 20/9/10 ---
>> >> > P.S. If the subject contains "[LON]" or the addressee acknowledges
the
>> >> > receipt within 48 hours then I don't resend the email.
>> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> >> time(x)
>> >> > < Now + 48h) ⇒ ¬resend(I, this).
>> >> >
>> >> > If an email is sent by a sender that is not a trusted contact or the
>> >> email
>> >> > does not contain a valid code then the email is not received. A
valid
>> >> code
>> >> > starts with a hyphen and ends with "X".
>> >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧
y
>> ∈
>> >> > L(-[a-z]+[0-9]X)).
>> >>
>> >
>> >
>> >
>> > --
>> > Regards,
>> > K. Gabriele
>> >
>> > --- unchanged since 20/9/10 ---
>> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
>> > receipt within 48 hours then I don't resend the email.
>> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> time(x)
>> > < Now + 48h) ⇒ ¬resend(I, this).
>> >
>> > If an email is sent by a sender that is not a trusted contact or the
>> email
>> > does not contain a valid code then the email is not received. A valid
>> code
>> > starts with a hyphen and ends with "X".
>> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
∈
>> > L(-[a-z]+[0-9]X)).
>> >
>>
>>
>>
>> --
>> Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).


Re: Hadoop and WikiLeaks

2011-05-19 Thread James Seigel
All your (code)base are belong to Hadoop?

James
On 2011-05-19, at 8:27 AM, M. C. Srivas wrote:

> Interesting to note that Cassandra and ZK are now considered Hadoop
> projects.
> 
> There were independent of Hadoop before the recent update.
> 
> 
> On Thu, May 19, 2011 at 4:18 AM, Steve Loughran  wrote:
> 
>> On 18/05/11 18:05, javam...@cox.net wrote:
>> 
>>> Yes!
>>> 
>>> -Pete
>>> 
>>>  Edward Capriolo  wrote:
>>> 
>>> =
>>> http://hadoop.apache.org/#What+Is+Apache%E2%84%A2+Hadoop%E2%84%A2%3F
>>> 
>>> March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation
>>> Awards
>>> 
>>> The Hadoop project won the "innovator of the year"award from the UK's
>>> Guardian newspaper, where it was described as "had the potential as a
>>> greater catalyst for innovation than other nominees including WikiLeaks
>>> and
>>> the iPad."
>>> 
>>> Does this copy text bother anyone else? Sure winning any award is great
>>> but
>>> does hadoop want to be associated with "innovation" like WikiLeaks?
>>> 
>> 
>> 
>> Ian updated the page yesterday with changes I'd put in for trademarks, and
>> I added this news quote directly from the paper. We could strip out the
>> quote easily enough.
>> 
>> 



Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Gabriele Kahlout
I said i don't have write access (~ the administrator with write access will
not place the script for me).

On Thu, May 19, 2011 at 3:56 PM, Niels Basjes  wrote:

> So why don't you ask for someone with write access to put the file there?
>
> 2011/5/19 Gabriele Kahlout :
>  > so your question is, why do you have the problem in the first place?
> > because it's not in $HADOOP_HOME/bin (older hadoop) and I don't have
> wrtie
> > access.
> >
> > On Thu, May 19, 2011 at 3:33 PM, Joey Echeverria 
> wrote:
> >
> >> Why do you need to move the script from $HADOOP_HOME/bin?
> >>
> >> Can't you just symlink it or write a script which runs the original?
> >>
> >> -Joey
> >>
> >> On May 19, 2011, at 4:15, Gabriele Kahlout 
> >> wrote:
> >>
> >> > I'm still having the following problem, any suggestions?
> >> >
> >> > I'm trying to modify the
> >> > hdfs<
> >>
> http://svn.apache.org/viewvc/hadoop/hdfs/tags/release-0.21.0/bin/hdfs?view=markup
> >> >script
> >>  > so that it still functions although not located in $HADOOP_HOME/bin
> >> > anymore, but when I execute the modified hdfs I get:
> >> >
> >> > hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found
> >> >
> >> > line 110 is:
> >> >
> >> > exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
> >> >
> >> > I've highlighted the changes I made to the script:
> >> >
> >> > bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin"; pwd
> >> >
> >> > ./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh
> >> >
> >> >
> >> > On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout <
> >> gabri...@mysimpatico.com
> >> >> wrote:
> >> >
> >> >> http://stackoverflow.com/q/6015818/300248
> >> >>
> >> >> --
> >> >> Regards,
> >> >> K. Gabriele
> >> >>
> >> >> --- unchanged since 20/9/10 ---
> >> >> P.S. If the subject contains "[LON]" or the addressee acknowledges
> the
> >> >> receipt within 48 hours then I don't resend the email.
> >> >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> >> >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
> >> >>
> >> >> If an email is sent by a sender that is not a trusted contact or the
> >> email
> >> >> does not contain a valid code then the email is not received. A valid
> >> code
> >> >> starts with a hyphen and ends with "X".
> >> >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧
> y ∈
> >> >> L(-[a-z]+[0-9]X)).
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Regards,
> >> > K. Gabriele
> >> >
> >> > --- unchanged since 20/9/10 ---
> >> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> >> > receipt within 48 hours then I don't resend the email.
> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> >> time(x)
> >> > < Now + 48h) ⇒ ¬resend(I, this).
> >> >
> >> > If an email is sent by a sender that is not a trusted contact or the
> >> email
> >> > does not contain a valid code then the email is not received. A valid
> >> code
> >> > starts with a hyphen and ends with "X".
> >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
> ∈
> >> > L(-[a-z]+[0-9]X)).
> >>
> >
> >
> >
> > --
> > Regards,
> > K. Gabriele
> >
> > --- unchanged since 20/9/10 ---
> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > receipt within 48 hours then I don't resend the email.
> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> > < Now + 48h) ⇒ ¬resend(I, this).
> >
> > If an email is sent by a sender that is not a trusted contact or the
> email
> > does not contain a valid code then the email is not received. A valid
> code
> > starts with a hyphen and ends with "X".
> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> > L(-[a-z]+[0-9]X)).
> >
>
>
>
> --
> Met vriendelijke groeten,
>
> Niels Basjes
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: Hadoop and WikiLeaks

2011-05-19 Thread M. C. Srivas
Interesting to note that Cassandra and ZK are now considered Hadoop
projects.

There were independent of Hadoop before the recent update.


On Thu, May 19, 2011 at 4:18 AM, Steve Loughran  wrote:

> On 18/05/11 18:05, javam...@cox.net wrote:
>
>> Yes!
>>
>> -Pete
>>
>>  Edward Capriolo  wrote:
>>
>> =
>> http://hadoop.apache.org/#What+Is+Apache%E2%84%A2+Hadoop%E2%84%A2%3F
>>
>> March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation
>> Awards
>>
>> The Hadoop project won the "innovator of the year"award from the UK's
>> Guardian newspaper, where it was described as "had the potential as a
>> greater catalyst for innovation than other nominees including WikiLeaks
>> and
>> the iPad."
>>
>> Does this copy text bother anyone else? Sure winning any award is great
>> but
>> does hadoop want to be associated with "innovation" like WikiLeaks?
>>
>
>
> Ian updated the page yesterday with changes I'd put in for trademarks, and
> I added this news quote directly from the paper. We could strip out the
> quote easily enough.
>
>


Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Niels Basjes
So why don't you ask for someone with write access to put the file there?

2011/5/19 Gabriele Kahlout :
> so your question is, why do you have the problem in the first place?
> because it's not in $HADOOP_HOME/bin (older hadoop) and I don't have wrtie
> access.
>
> On Thu, May 19, 2011 at 3:33 PM, Joey Echeverria  wrote:
>
>> Why do you need to move the script from $HADOOP_HOME/bin?
>>
>> Can't you just symlink it or write a script which runs the original?
>>
>> -Joey
>>
>> On May 19, 2011, at 4:15, Gabriele Kahlout 
>> wrote:
>>
>> > I'm still having the following problem, any suggestions?
>> >
>> > I'm trying to modify the
>> > hdfs<
>> http://svn.apache.org/viewvc/hadoop/hdfs/tags/release-0.21.0/bin/hdfs?view=markup
>> >script
>>  > so that it still functions although not located in $HADOOP_HOME/bin
>> > anymore, but when I execute the modified hdfs I get:
>> >
>> > hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found
>> >
>> > line 110 is:
>> >
>> > exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
>> >
>> > I've highlighted the changes I made to the script:
>> >
>> > bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin"; pwd
>> >
>> > ./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh
>> >
>> >
>> > On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout <
>> gabri...@mysimpatico.com
>> >> wrote:
>> >
>> >> http://stackoverflow.com/q/6015818/300248
>> >>
>> >> --
>> >> Regards,
>> >> K. Gabriele
>> >>
>> >> --- unchanged since 20/9/10 ---
>> >> P.S. If the subject contains "[LON]" or the addressee acknowledges the
>> >> receipt within 48 hours then I don't resend the email.
>> >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>> >>
>> >> If an email is sent by a sender that is not a trusted contact or the
>> email
>> >> does not contain a valid code then the email is not received. A valid
>> code
>> >> starts with a hyphen and ends with "X".
>> >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
>> >> L(-[a-z]+[0-9]X)).
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards,
>> > K. Gabriele
>> >
>> > --- unchanged since 20/9/10 ---
>> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
>> > receipt within 48 hours then I don't resend the email.
>> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> time(x)
>> > < Now + 48h) ⇒ ¬resend(I, this).
>> >
>> > If an email is sent by a sender that is not a trusted contact or the
>> email
>> > does not contain a valid code then the email is not received. A valid
>> code
>> > starts with a hyphen and ends with "X".
>> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
>> > L(-[a-z]+[0-9]X)).
>>
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>



-- 
Met vriendelijke groeten,

Niels Basjes


0.20.0 Documentation Error

2011-05-19 Thread Jones, Nick
Other may have run into this, but the documentation on setting up M/R queue 
ACLs has a minor error:

'mapred.queue.queue-name.acl-administer-job' should be 
'mapred.queue.queue-name.acl-administer-jobs' on 
http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html#Configuring+the+Hadoop+Daemons.

Nick



Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Gabriele Kahlout
so your question is, why do you have the problem in the first place?
because it's not in $HADOOP_HOME/bin (older hadoop) and I don't have wrtie
access.

On Thu, May 19, 2011 at 3:33 PM, Joey Echeverria  wrote:

> Why do you need to move the script from $HADOOP_HOME/bin?
>
> Can't you just symlink it or write a script which runs the original?
>
> -Joey
>
> On May 19, 2011, at 4:15, Gabriele Kahlout 
> wrote:
>
> > I'm still having the following problem, any suggestions?
> >
> > I'm trying to modify the
> > hdfs<
> http://svn.apache.org/viewvc/hadoop/hdfs/tags/release-0.21.0/bin/hdfs?view=markup
> >script
>  > so that it still functions although not located in $HADOOP_HOME/bin
> > anymore, but when I execute the modified hdfs I get:
> >
> > hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found
> >
> > line 110 is:
> >
> > exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
> >
> > I've highlighted the changes I made to the script:
> >
> > bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin"; pwd
> >
> > ./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh
> >
> >
> > On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout <
> gabri...@mysimpatico.com
> >> wrote:
> >
> >> http://stackoverflow.com/q/6015818/300248
> >>
> >> --
> >> Regards,
> >> K. Gabriele
> >>
> >> --- unchanged since 20/9/10 ---
> >> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> >> receipt within 48 hours then I don't resend the email.
> >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
> >>
> >> If an email is sent by a sender that is not a trusted contact or the
> email
> >> does not contain a valid code then the email is not received. A valid
> code
> >> starts with a hyphen and ends with "X".
> >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> >> L(-[a-z]+[0-9]X)).
> >>
> >>
> >
> >
> > --
> > Regards,
> > K. Gabriele
> >
> > --- unchanged since 20/9/10 ---
> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > receipt within 48 hours then I don't resend the email.
> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> > < Now + 48h) ⇒ ¬resend(I, this).
> >
> > If an email is sent by a sender that is not a trusted contact or the
> email
> > does not contain a valid code then the email is not received. A valid
> code
> > starts with a hyphen and ends with "X".
> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> > L(-[a-z]+[0-9]X)).
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Joey Echeverria
Why do you need to move the script from $HADOOP_HOME/bin?

Can't you just symlink it or write a script which runs the original?

-Joey

On May 19, 2011, at 4:15, Gabriele Kahlout  wrote:

> I'm still having the following problem, any suggestions?
> 
> I'm trying to modify the
> hdfsscript
> so that it still functions although not located in $HADOOP_HOME/bin
> anymore, but when I execute the modified hdfs I get:
> 
> hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found
> 
> line 110 is:
> 
> exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
> 
> I've highlighted the changes I made to the script:
> 
> bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin"; pwd
> 
> ./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh
> 
> 
> On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout > wrote:
> 
>> http://stackoverflow.com/q/6015818/300248
>> 
>> --
>> Regards,
>> K. Gabriele
>> 
>> --- unchanged since 20/9/10 ---
>> P.S. If the subject contains "[LON]" or the addressee acknowledges the
>> receipt within 48 hours then I don't resend the email.
>> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>> 
>> If an email is sent by a sender that is not a trusted contact or the email
>> does not contain a valid code then the email is not received. A valid code
>> starts with a hyphen and ends with "X".
>> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
>> L(-[a-z]+[0-9]X)).
>> 
>> 
> 
> 
> -- 
> Regards,
> K. Gabriele
> 
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
> 
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).


Re: Applications creates bigger output than input?

2011-05-19 Thread Niels Basjes
Something I've seen in the past is code that has the input
   "something"
and outputs
   "s"
   "so"
   "som"
   "some"
   "somet"
   "someth"
   "somethi"
   "somethin"
   "something"

So the output number of records is the same as the length of the input text.

Niels

2011/5/19 elton sky :
> Hello,
> I pick up this topic again, because what I am looking for is something not
> CPU bound. Augmenting data for ETL and generating index are good examples.
> Neither of them requires too much cpu time on map side. The main bottle neck
> for them is shuffle and merge.
>
> Market basket analysis is cpu intensive in map phase, for sampling all
> possible combinations of items.
>
> I am still looking for more applications, which creates bigger output and
> not CPU bound.
> Any further idea? I appreciate.
>
>
> On Tue, May 3, 2011 at 3:10 AM, Steve Loughran  wrote:
>
>> On 30/04/2011 05:31, elton sky wrote:
>>
>>> Thank you for suggestions:
>>>
>>> Weblog analysis, market basket analysis and generating search index.
>>>
>>> I guess for these applications we need more reduces than maps, for
>>> handling
>>> large intermediate output, isn't it. Besides, the input split for map
>>> should
>>> be smaller than usual,  because the workload for spill and merge on map's
>>> local disk is heavy.
>>>
>>
>> any form of rendering can generate very large images
>>
>> see: http://www.hpl.hp.com/techreports/2009/HPL-2009-345.pdf
>>
>>
>>
>



-- 
Met vriendelijke groeten,

Niels Basjes


Re: Reducer granularity and starvation

2011-05-19 Thread Michel Segel
Fair scheduler won't help unless you set it to allow preemptive executions 
which may not be a good thing...

Fair scheduler will wait until the current task completes before assigning a 
new task to the open slot.  So if you have a long running job... You're SOL.

A combiner will definitely help but you will still have the issue of long 
running jobs. You could put you job in a queue that limits the number of 
slots... But then you will definitely increase the time to run your job.

If you could suspend a task... But that's anon-trivial solution...

Sent from a remote device. Please excuse any typos...

Mike Segel

On May 18, 2011, at 5:04 PM, "W.P. McNeill"  wrote:

> I'm using fair scheduler and JVM reuse.  It is just plain a big job.
> 
> I'm not using a combiner right now, but that's something to look at.
> 
> What about bumping the mapred.reduce.tasks up to some huge number?  I think
> that shouldn't make a difference, but I'm hearing conflicting information on
> this.


Re: Hadoop and WikiLeaks

2011-05-19 Thread Steve Loughran

On 18/05/11 18:05, javam...@cox.net wrote:

Yes!

-Pete

 Edward Capriolo  wrote:

=
http://hadoop.apache.org/#What+Is+Apache%E2%84%A2+Hadoop%E2%84%A2%3F

March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation
Awards

The Hadoop project won the "innovator of the year"award from the UK's
Guardian newspaper, where it was described as "had the potential as a
greater catalyst for innovation than other nominees including WikiLeaks and
the iPad."

Does this copy text bother anyone else? Sure winning any award is great but
does hadoop want to be associated with "innovation" like WikiLeaks?



Ian updated the page yesterday with changes I'd put in for trademarks, 
and I added this news quote directly from the paper. We could strip out 
the quote easily enough.




Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Gabriele Kahlout
I'm still having the following problem, any suggestions?

I'm trying to modify the
hdfsscript
so that it still functions although not located in $HADOOP_HOME/bin
anymore, but when I execute the modified hdfs I get:

hdfs: line 110: exec: org.apache.hadoop.fs.FsShell: not found

line 110 is:

exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"

I've highlighted the changes I made to the script:

bin=*"$HADOOP_HOME"/bin # was* dirname "$0" bin=cd "$bin"; pwd

./*hdfs-config.sh # was .* "$bin"/hdfs-config.sh


On Mon, May 16, 2011 at 12:20 PM, Gabriele Kahlout  wrote:

> http://stackoverflow.com/q/6015818/300248
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>
>


-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: some guidance needed

2011-05-19 Thread Ioan Eugen Stan
I have forwarded this discussion to my mentors so they are informed
and I hope they will provide better input regarding email storage.

> I second what Todd said, even with FuseHDFS, mounting HDFS as a regular file
> system, it won't give you the immediate response about the file status that
> you need. I believe Google implemented Gmail with HBase. Here is an example
> of implementing a mail store with Cassandra:
> http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf
>
> Mark

Thanks Mark, I will look into that. I am currently watching. Claudera
Hadoop Training [1] to get a better view of how things work.

I have one question: what is the defining difference between Cassandra
and HBase? Also, Eric, one of my mentors, suggested I use Gora for
this and after a quick look at Gora I saw that it is an ORM for HBase
and Cassandra which will allow me switch between them. The downside
with this is that Gora is still incubating so a piece of advice about
using it or not is welcomed. I will also ask on the Gora mailing list
to see how things are there.

>> I would encourage you to look at a system like HBase for your mail
>> backend. HDFS doesn't work well with lots of little files, and also
>> doesn't support random update, so existing formats like Maildir
>> wouldn't be a good fit.

I don't think I understand correctly what you mean by random updates.
E-mails are immutable so once written they are not going to be
updated. But if you are referring to the fact that lots of (small)
files will be written in a directory and that this can be a problem
then I get it. This will also mean that mailbox format (all emails in
one file) will be more inappropriate than Maildir. But since e-mails
are immutable and adding a mail to the mailbox means appending a small
piece of data to the file this should not be a problem if Hadoop has
append.

The presentation on Vimeo it stated that HDFS 0.19 did not had append,
I don't know yet what is the status on that, but things are a little
brighter. You could have a mailbox file that could grow to a very
large size. This will lead to all the users emails into one big file
that is easy to manage, the only thing that it's missing is the
fetching the emails. Since emails are appended to the file (inbox) as
they come, and you usually are interested in the latest emails
received you could just read the tail of the file and do some indexing
based on that. Should I post this on the HDFS mailing-list also?

I'm talking without real experience with Hadoop so shut me up if I'm wrong.

>> --
>> Todd Lipcon
>> Software Engineer, Cloudera

You are form Cloudera, nice. Answers straight from the source :).

[1] http://vimeo.com/3591321

Thanks,

-- 
Ioan-Eugen Stan


Re: problem using getLocalCacheArchives in DistributeCache

2011-05-19 Thread Diego Ceccarelli
Dear all,

I finally solved the Distribute Cache issue using  symlink:
Before launching the jobs I put:

//activate symlink
DistributedCache.createSymlink(jobConf);
URI archiveUri = new URI(hdfsArchivePath+"#symbolicName");
DistributedCache.addCacheArchive(archiveUri, jobConf);


Then in the jobs I used:

URL resource = jobConf.getResource("#symbolicName");

Now, "resource" contains the path of the directory where the
archive is locally decompressed.
Hope it helps.

Best,
Diego









On Mon, May 16, 2011 at 11:00 PM, Diego Ceccarelli
 wrote:
> Hi all,
> I'm trying to distribute locally a MapFile using Hadoop's Distribute Cache.
> As The Definitive Guide suggests, since MapFiles are a collection of files
> with a defined directory structure, I zipped it and I copied in the hdfs:
>
> bin/hadoop fs -copyFromLocal mapfile.zip /user/myuser/myproject/
>
> and I tried to use the DistributedCache to send a copy of the mapfile
> to each node (as explained in [1]). So I set
>
> DistributedCache.addCacheArchive(new
> Path("/user/myuser/myproject/mapfile.zip").toUri(), jobConf);
>
> and then in the reduce step I put:
>
> Path[] files = DistributedCache.getLocalCacheArchives(conf);
>
> this retrieves the path of the zipped file on the local node, while,
> according to [1].
> i expected to find the extracted archive:
>
> "DistributedCache can be used to distribute simple, read-only
> data/text files and/or more complex types such as archives, jars etc.
> Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave
> nodes."
>
> I also tried to unzip the file but at the expected path I always do
> not find the files that should be there.
> Does anyone know where I mistake? Could anyone show me a bunch of code
> to locally access file
> within an archive?
>
> Thanks in advance!
> Diego
>
>
> [1] 
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/filecache/DistributedCache.html
>



-- 
Computers are useless. They can only give you answers.
(Pablo Picasso)
___
Diego Ceccarelli
High Performance Computing Laboratory
Information Science and Technologies Institute (ISTI)
Italian National Research Council (CNR)
Via Moruzzi, 1
56124 - Pisa - Italy

Phone: +39 050 315 3055
Fax: +39 050 315 2040



Re: Problem with running WordCount example

2011-05-19 Thread Ferdy Galema
You should not put the application jar itself on the hdfs (in this case 
hadoop-*-examples.jar).


If you run the following command,  and  are directories on the 
configured filesystem (which is DFS most of the times):

bin/hadoop jar hadoop-*-examples.jar wordcount  


On 05/19/2011 09:17 AM, Debapriya Roy wrote:

Hi,

While running the WordCount.java file from hadoop, faced the following error:

Exception in thread "main" java.io.IOException: Error opening job jar: 
/user/droy/wordcount_hdfs/WordCount.jar
 at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.(ZipFile.java:114)
 at java.util.jar.JarFile.(JarFile.java:133)
 at java.util.jar.JarFile.(JarFile.java:70)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:88)


Here the hdfs directory is : /user/droy/wordcount_hdfs/


Please advise.


Thanks and Regards,
Debapriya Roy  | ASM(NPR) | Tech  Mahindra
5th Floor,  Techno India Building, Sector-V, Saltlake , Kolkata-700091, INDIA
* Office: (033)40028100  |  Extn:  7509
Email: debapr...@techmahindra.commailto:debapr...@techmahindra.com>
www.techmahindra.comhttp://www.techmahindra.com/>


Disclaimer:
  This message and the information contained herein is proprietary and confidential and subject to the Tech Mahindra policy 
statement, you may review the policy athttp://www.techmahindra.com/Disclaimer.html";>http://www.techmahindra.com/Disclaimer.html  
externally andhttp://tim.techmahindra.com/Disclaimer.html";>http://tim.techmahindra.com/Disclaimer.html  
internally within Tech 
Mahindra.



Re: Applications creates bigger output than input?

2011-05-19 Thread elton sky
Hello,
I pick up this topic again, because what I am looking for is something not
CPU bound. Augmenting data for ETL and generating index are good examples.
Neither of them requires too much cpu time on map side. The main bottle neck
for them is shuffle and merge.

Market basket analysis is cpu intensive in map phase, for sampling all
possible combinations of items.

I am still looking for more applications, which creates bigger output and
not CPU bound.
Any further idea? I appreciate.


On Tue, May 3, 2011 at 3:10 AM, Steve Loughran  wrote:

> On 30/04/2011 05:31, elton sky wrote:
>
>> Thank you for suggestions:
>>
>> Weblog analysis, market basket analysis and generating search index.
>>
>> I guess for these applications we need more reduces than maps, for
>> handling
>> large intermediate output, isn't it. Besides, the input split for map
>> should
>> be smaller than usual,  because the workload for spill and merge on map's
>> local disk is heavy.
>>
>
> any form of rendering can generate very large images
>
> see: http://www.hpl.hp.com/techreports/2009/HPL-2009-345.pdf
>
>
>


Problem with running WordCount example

2011-05-19 Thread Debapriya Roy
Hi,

While running the WordCount.java file from hadoop, faced the following error:

Exception in thread "main" java.io.IOException: Error opening job jar: 
/user/droy/wordcount_hdfs/WordCount.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:114)
at java.util.jar.JarFile.(JarFile.java:133)
at java.util.jar.JarFile.(JarFile.java:70)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)


Here the hdfs directory is : /user/droy/wordcount_hdfs/


Please advise.


Thanks and Regards,
Debapriya Roy  | ASM(NPR) | Tech  Mahindra
5th Floor,  Techno India Building, Sector-V, Saltlake , Kolkata-700091, INDIA
* Office: (033)40028100  |  Extn:  7509
Email: debapr...@techmahindra.commailto:debapr...@techmahindra.com>
www.techmahindra.comhttp://www.techmahindra.com/>


Disclaimer:
  This message and the information contained herein is proprietary and 
confidential and subject to the Tech Mahindra policy statement, you may review 
the policy at http://www.techmahindra.com/Disclaimer.html";>http://www.techmahindra.com/Disclaimer.html
 externally and http://tim.techmahindra.com/Disclaimer.html";>http://tim.techmahindra.com/Disclaimer.html
 internally within Tech 
Mahindra.