speedy Google Maps driving directions like implementation

2009-06-10 Thread Lukáš Vlček
Hi,
I am wondering how Google implemented the driving directions function in the
Maps. More specifically how did they do it that it is so fast. I asked on
Google engineer about this and all he told me is just that there are bunch
of MapReduce cycles involved in this process but I don't think it is really
close to the truth. How it is possible to implement such real-time function
in plain MapReduce fashion (from the Hadoop point of view)? The only
possibility I can think of right now it that they either execute the
MapReduce computation only in the memory (intermediate results as well as
Reduce to next Map results are kept only in some-kind of distributed memory)
or they use other architecture for this.

In simple words I know there are some tutorials about how to nail down SSSP
problem with MapReduce but I can not believe it can produce results in such
a quick response I can experience with Google Maps.

Any comments, ideas?

Thanks,
Lukas


Re: Chaining Multiple Map reduce jobs.

2009-04-08 Thread Lukáš Vlček
Hi,
by far I am not an Hadoop expert but I think you can not start Map task
until the previous Reduce is finished. Saying this it means that you
probably have to store the Map output to the disk first (because a] it may
not fit into memory and b] you would risk data loss if the system crashes).
As for the job chaining you can check JobControl class (
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html)

Also you can look at https://issues.apache.org/jira/browse/HADOOP-3702

Regards,
Lukas

On Wed, Apr 8, 2009 at 11:30 PM, asif md  wrote:

> hi everyone,
>
> i have to chain multiple map reduce jobs < actually 2 to 4 jobs >, each of
> the jobs depends on the o/p of preceding job. In the reducer of each job
> I'm
> doing very little < just grouping by key from the maps>. I want to give the
> output of one MapReduce job to the next job without having to go to the
> disk. Does anyone have any ideas on how to do this?
>
> Thanx.
>



-- 
http://blog.lukas-vlcek.com/


Re: Amazon Elastic MapReduce

2009-04-03 Thread Lukáš Vlček
I may be wrong but I would welcome this. As far as I understand the hot
topic in cloud computing these days is standardization ... and I would be
happy if Hadoop could be considered as a standard for cloud computing
architecture. So the more Amazon pushes Hadoop the more it could be accepted
by other players in this market (and the better for customers when switching
from one cloud provider to the other). Just my 2 cents.
Regards,
Lukas

On Fri, Apr 3, 2009 at 4:36 PM, Stuart Sierra
wrote:

> On Thu, Apr 2, 2009 at 4:13 AM, zhang jianfeng  wrote:
> > seems like I should pay for additional money, so why not configure a
> hadoop
> > cluster in EC2 by myself. This already have been automatic using script.
>
> Personally, I'm excited about this.  They're charging a tiny fraction
> above the standard EC2 rate.  I like that the cluster shuts down
> automatically when the job completes -- you don't have to sit around
> and watch it.  Yeah, you can automate that, but it's one more thing to
> think about.
>
> -Stuart
>



-- 
http://blog.lukas-vlcek.com/


Re: Need more detail on Hadoop architecture

2009-03-30 Thread Lukáš Vlček
BTW: there are at least two booksHadoop: The definitive
guide<http://oreilly.com/catalog/9780596521998/>

Hadoop in Action <http://www.manning.com/lam/>
both of wich I can recommend

anyway, simple web searching on the topic should give you a lot of
information.

Lukas

2009/3/30 Lukáš Vlček 

> Sorry ... :-)I was too quick and I didn't notice that you already pointed
> out this link.
>
>
> On Mon, Mar 30, 2009 at 11:57 PM, Lukáš Vlček wrote:
>
>> Hi,
>> This tutorial can be a good start:
>> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
>>
>> Regards,
>> Lukas
>>
>>
>> On Mon, Mar 30, 2009 at 11:49 PM, I LOVE Hadoop :) <
>> kusanagiyang.had...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I want to know more details on using Hadoop framework to sort input data.
>>> In example, sorting can be as simple as using identical map and reduce
>>> class, and just allow the framework to do its basic work.
>>> From article
>>> http://hadoop.apache.org/core/docs/r0.19.1/mapred_tutorial.html#Overview,
>>> I
>>> learned that a number of classes, such as comparator and partitioner, can
>>> be
>>> involved in the process, but I can not find more details how it takes
>>> place.
>>> Could someone help?
>>> Thanks.
>>>
>>> --
>>> Cheers! Hadoop core
>>>
>>
>>
>>
>> --
>> http://blog.lukas-vlcek.com/
>>
>
>
>
> --
> http://blog.lukas-vlcek.com/
>



-- 
http://blog.lukas-vlcek.com/


Re: Need more detail on Hadoop architecture

2009-03-30 Thread Lukáš Vlček
Sorry ... :-)I was too quick and I didn't notice that you already pointed
out this link.

On Mon, Mar 30, 2009 at 11:57 PM, Lukáš Vlček  wrote:

> Hi,
> This tutorial can be a good start:
> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
>
> Regards,
> Lukas
>
>
> On Mon, Mar 30, 2009 at 11:49 PM, I LOVE Hadoop :) <
> kusanagiyang.had...@gmail.com> wrote:
>
>> Hello,
>>
>> I want to know more details on using Hadoop framework to sort input data.
>> In example, sorting can be as simple as using identical map and reduce
>> class, and just allow the framework to do its basic work.
>> From article
>> http://hadoop.apache.org/core/docs/r0.19.1/mapred_tutorial.html#Overview,
>> I
>> learned that a number of classes, such as comparator and partitioner, can
>> be
>> involved in the process, but I can not find more details how it takes
>> place.
>> Could someone help?
>> Thanks.
>>
>> --
>> Cheers! Hadoop core
>>
>
>
>
> --
> http://blog.lukas-vlcek.com/
>



-- 
http://blog.lukas-vlcek.com/


Re: Need more detail on Hadoop architecture

2009-03-30 Thread Lukáš Vlček
Hi,
This tutorial can be a good start:
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html

Regards,
Lukas

On Mon, Mar 30, 2009 at 11:49 PM, I LOVE Hadoop :) <
kusanagiyang.had...@gmail.com> wrote:

> Hello,
>
> I want to know more details on using Hadoop framework to sort input data.
> In example, sorting can be as simple as using identical map and reduce
> class, and just allow the framework to do its basic work.
> From article
> http://hadoop.apache.org/core/docs/r0.19.1/mapred_tutorial.html#Overview,
> I
> learned that a number of classes, such as comparator and partitioner, can
> be
> involved in the process, but I can not find more details how it takes
> place.
> Could someone help?
> Thanks.
>
> --
> Cheers! Hadoop core
>



-- 
http://blog.lukas-vlcek.com/


Re: Cloudera Hadoop and Hive training now free online

2009-03-13 Thread Lukáš Vlček
Hi,
This is excellent!

Does any of these presentations deal specifically with processing tree and
graph data structures? I know that some basics can be found in the fifth
MapReduce lecture here (http://www.youtube.com/watch?v=BT-piFBP4fE)
presented by Aaron Kimball or here (
http://video.google.com/videoplay?docid=741403180270990805) by Barry Brumit
but something more detailed and comparing different approaches would be
really helpful.

Tree is often used in many algorithms (not only it can express hierarchy but
can be used to compress data and many other fancy things...). I think there
should be some knowledge about what works well and what does not with
connection to MapReduce and trees (or graphs). I am looking for this
information.

Regards,
Lukas

On Fri, Mar 13, 2009 at 9:42 PM, Christophe Bisciglia <
christo...@cloudera.com> wrote:

> Hey there, today we released our basic Hadoop and Hive training
> online. Access is free, and we address questions through Get
> Satisfaction.
>
> Many on this list are surely pros, but when you have friends trying to
> get up to speed, feel free to send this along. We provide a VM so new
> users can start doing the exercises right away.
>
> http://www.cloudera.com/hadoop-training-basic
>
> Cheers,
> Christophe
>


Re: Off topic: web framework for high traffic service

2009-03-04 Thread Lukáš Vlček
Hi Tim,
Thanks for links.

I know this may sound off topic. On the other hand if you look for example
at the eBay architecture (http://highscalability.com/ebay-architecture) then
you can see that some concepts are close to Hadoop like system (I mean when
you want to build somethink like eBay then should think more about how you
balance, cluster, separate, isolate, parallelize a cache things). I assume
some people on this list may know thing or two about these things...

Regards,
Lukas

On Thu, Mar 5, 2009 at 3:11 AM, Tim Wintle wrote:

> On Wed, 2009-03-04 at 23:14 +0100, Lukáš Vlček wrote:
> > Sorry for off topic question
> It is very off topic.
>
> > Any ideas, best practices, book recomendations, papers, tech talk links
> ...
> I found this a nice little book:
> <
> http://developer.yahoo.net/blog/archives/2008/11/allspaw_capacityplanning.html
> >
>
> and this a nice techtalk on why the servers aren't the most important
> thing to worry about.
> <http://www.youtube.com/watch?v=BTHvs3V8DBA>
>
>
>


-- 
http://blog.lukas-vlcek.com/


Re: Announcing CloudBase-1.2.1 release

2009-03-03 Thread Lukáš Vlček
Hi Taran,
This looks impressive. I quickly looked at the documentation, am I right
that it does not support unique keys and foreign keys for tables?

Regards,
Lukas

On Mon, Mar 2, 2009 at 8:33 PM, Tarandeep Singh  wrote:

> Hi,
>
> We have just released 1.2.1 version of CloudBase on sourceforge-
> http://cloudbase.sourceforge.net
>
> [ CloudBase is a data warehouse system built on top of Hadoop's Map-Reduce
> architecture. It uses ANSI SQL as its query language and comes with a JDBC
> driver. It is developed by Business.com and is released to open source
> community under GNU GPL license]
>
> This release fixes one issue with the 1.2 release- Table Indexing feature
> was not enabled in the 1.2 release. This release fixes this issue.
>
> Also we have updated the svn repository on the sourceforge site and we
> invite contributors to work with us to improve CloudBase. The svn
> repository
> url is-
> https://cloudbase.svn.sourceforge.net/svnroot/cloudbase/trunk
>
> We will be uploading Developer's guide/documentation on the CloudBase
> website very soon. Meanwhile, if someone wants to try compiling the code
> and
> play around with it, please contact me, I can help you get started.
>
> Thanks,
> Taran
>



-- 
http://blog.lukas-vlcek.com/


Re: Finding longest path in a graph

2009-02-01 Thread Lukáš Vlček
Did you look at the paper which is a base for Mahout implementation?
http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf

I think it dicusses some theoretical aspects of such transformation.

Lukas

2009/2/2 Ricky Ho 

> I heard about Mahout address the machine learning portion that but haven't
> look at any detail yet.
>
> I am looking more from the theory side (how to transform the algorithm into
> the Map/Reduce form) and not necessary an implementation.
>
> Rgds,
> Ricky
> -----Original Message-
> From: Lukáš Vlček [mailto:lukas.vl...@gmail.com]
> Sent: Sunday, February 01, 2009 10:29 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Finding longest path in a graph
>
> Ricky,
> Are you aware of Mahout project?
> http://lucene.apache.org/mahout/
>
> I think this project tries to addess some of the algorithms you have
> mentioned.
>
> Regards,
> Lukas
>
> 2009/2/2 Ricky Ho 
>
> > Yes, Map/Reduce model itself is simple.  But transforming a non-trivial
> > algorithm into Map/Reduce is not simple.
> >
> > I wonder if there is any effort from the academia to look into build a
> > catalog of how each of our familiar algorithms can be transformed into
> > Map/Reduce, such as ...
> > 1) Sorting
> > 2) Searching (index tree, hash, geo/spacial search)
> > 3) Network/Graph processing (min spanning tree, shortest path, network
> > diameter)
> > 4) Computational Geometry (Ray tracing, Convex Hull)
> > 5) Optimization problem (Maximum flow, Linear programming, Hill climbing)
> > 6) Machine learning (logistic regression, nearest neighbor, cluster,
> > Bayesian classification, decision tree, neural network)
> >
> > It is also important to recognize there are other models in our tool box
> > (besides map/reduce) for parallelizing an algorithm, such as ...
> > a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace)
> > b) Dependency graph / Data flow programming (e.g. Dryad)
> > c) MPI
> > d) Multi-thread / Shared memory model
> > e) ... anything else ...
> >
> > Rgds,
> > Ricky
> >
> > -Original Message-
> > From: Lukáš Vlček [mailto:lukas.vl...@gmail.com]
> > Sent: Sunday, February 01, 2009 11:37 AM
> > To: core-user@hadoop.apache.org
> > Subject: Re: Finding longest path in a graph
> >
> > Hi,
> > just my 2 cents.
> > You are right that MapReduce does not fit to all problems. Expecially,
> when
> > it comes to processing graphs. On the other hand I think even in Google
> > they
> > stick with MapReduce for complex graph processing (and they do it in
> Yahoo!
> > as well). I had a chance to talk to one Google engineer and I asked him
> > exactly this question ("Do you use MapReduce even if it is known that
> > specific problems can be solved with different architecture in a more
> > efficient way?"). And the answer was "yes". I don't know if Google
> > engineers
> > are using different architectures - and I would be surprised if not - but
> > they probably don't use it at the same scale as they do with MapReduce.
> > There are several good reasons for this: MapReduce is easy to learn
> (which
> > means that even fresh interns can use it very quickly - also in not
> > efficient way), they probably have very nice visual tools for managing
> > MapReduce clusters and finally they have a lot of HW resources.
> >
> > BTW: Andrzej, do you consider contributing your graph processing utilis
> > into
> > Hadoop or Mahout?
> >
> > Regards,
> > Lukas
> >
> > On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner 
> > wrote:
> >
> > > Andrzej,
> > > without deeper understanding of exactly what you are doing, I have a
> gut
> > > feeling that a different distributed system might be a better fit for
> > this
> > > specific task. I assume, you are dealing with very large graphs if you
> > are
> > > using Hadoop, and you want grid processing. But the linear nature of
> > > Map/Reduce may make it hard to fit. As the MapReduce paper said, not
> > every
> > > task can be easily expressed this way.
> > >
> > > The other technology I mean is JavaSpaces, of which I usually use the
> > > GigaSpaces implementation. This allows more complex algorithms. You
> will
> > > store your complete graph as an appropriate structure in a JavaSpace,
> and
> > > will also restructure it for parallel processing, as outlined in some
> > > JavaSpaces books. Then you can have as many workers 

Re: Finding longest path in a graph

2009-02-01 Thread Lukáš Vlček
Ricky,
Are you aware of Mahout project?
http://lucene.apache.org/mahout/

I think this project tries to addess some of the algorithms you have
mentioned.

Regards,
Lukas

2009/2/2 Ricky Ho 

> Yes, Map/Reduce model itself is simple.  But transforming a non-trivial
> algorithm into Map/Reduce is not simple.
>
> I wonder if there is any effort from the academia to look into build a
> catalog of how each of our familiar algorithms can be transformed into
> Map/Reduce, such as ...
> 1) Sorting
> 2) Searching (index tree, hash, geo/spacial search)
> 3) Network/Graph processing (min spanning tree, shortest path, network
> diameter)
> 4) Computational Geometry (Ray tracing, Convex Hull)
> 5) Optimization problem (Maximum flow, Linear programming, Hill climbing)
> 6) Machine learning (logistic regression, nearest neighbor, cluster,
> Bayesian classification, decision tree, neural network)
>
> It is also important to recognize there are other models in our tool box
> (besides map/reduce) for parallelizing an algorithm, such as ...
> a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace)
> b) Dependency graph / Data flow programming (e.g. Dryad)
> c) MPI
> d) Multi-thread / Shared memory model
> e) ... anything else ...
>
> Rgds,
> Ricky
>
> -Original Message-
> From: Lukáš Vlček [mailto:lukas.vl...@gmail.com]
> Sent: Sunday, February 01, 2009 11:37 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Finding longest path in a graph
>
> Hi,
> just my 2 cents.
> You are right that MapReduce does not fit to all problems. Expecially, when
> it comes to processing graphs. On the other hand I think even in Google
> they
> stick with MapReduce for complex graph processing (and they do it in Yahoo!
> as well). I had a chance to talk to one Google engineer and I asked him
> exactly this question ("Do you use MapReduce even if it is known that
> specific problems can be solved with different architecture in a more
> efficient way?"). And the answer was "yes". I don't know if Google
> engineers
> are using different architectures - and I would be surprised if not - but
> they probably don't use it at the same scale as they do with MapReduce.
> There are several good reasons for this: MapReduce is easy to learn (which
> means that even fresh interns can use it very quickly - also in not
> efficient way), they probably have very nice visual tools for managing
> MapReduce clusters and finally they have a lot of HW resources.
>
> BTW: Andrzej, do you consider contributing your graph processing utilis
> into
> Hadoop or Mahout?
>
> Regards,
> Lukas
>
> On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner 
> wrote:
>
> > Andrzej,
> > without deeper understanding of exactly what you are doing, I have a gut
> > feeling that a different distributed system might be a better fit for
> this
> > specific task. I assume, you are dealing with very large graphs if you
> are
> > using Hadoop, and you want grid processing. But the linear nature of
> > Map/Reduce may make it hard to fit. As the MapReduce paper said, not
> every
> > task can be easily expressed this way.
> >
> > The other technology I mean is JavaSpaces, of which I usually use the
> > GigaSpaces implementation. This allows more complex algorithms. You will
> > store your complete graph as an appropriate structure in a JavaSpace, and
> > will also restructure it for parallel processing, as outlined in some
> > JavaSpaces books. Then you can have as many workers as you want, working
> on
> > individual nodes.
> >
> > Mark
> >
> > On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki 
> wrote:
> >
> > > Hi,
> > >
> > > I'm looking for an advice. I need to process a directed graph encoded
> as
> > a
> > > list of  pairs. The goal is to compute a list of longest
> paths
> > in
> > > the graph. There is no guarantee that the graph is acyclic, so there
> > should
> > > be some mechanism to detect cycles.
> > >
> > > Currently I'm using a simple approach consisting of the following: I
> > encode
> > > the graph as >, and extending
> the
> > > paths by one degree at a time. This means that in order to find the
> > longest
> > > path of degree N it takes N + 1 map-reduce jobs.
> > >
> > > Are you perhaps aware of a smarter way to do it? I would appreciate any
> > > pointers.
> > >
> > > --
> > > Best regards,
> > > Andrzej Bialecki <><
> > >  ___. ___ ___ ___ _ _   __
> > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > http://www.sigram.com  Contact: info at sigram dot com
> > >
> > >
> >
>
>
>
> --
> http://blog.lukas-vlcek.com/
>



-- 
http://blog.lukas-vlcek.com/


Re: Finding longest path in a graph

2009-02-01 Thread Lukáš Vlček
Hi,
just my 2 cents.
You are right that MapReduce does not fit to all problems. Expecially, when
it comes to processing graphs. On the other hand I think even in Google they
stick with MapReduce for complex graph processing (and they do it in Yahoo!
as well). I had a chance to talk to one Google engineer and I asked him
exactly this question ("Do you use MapReduce even if it is known that
specific problems can be solved with different architecture in a more
efficient way?"). And the answer was "yes". I don't know if Google engineers
are using different architectures - and I would be surprised if not - but
they probably don't use it at the same scale as they do with MapReduce.
There are several good reasons for this: MapReduce is easy to learn (which
means that even fresh interns can use it very quickly - also in not
efficient way), they probably have very nice visual tools for managing
MapReduce clusters and finally they have a lot of HW resources.

BTW: Andrzej, do you consider contributing your graph processing utilis into
Hadoop or Mahout?

Regards,
Lukas

On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner  wrote:

> Andrzej,
> without deeper understanding of exactly what you are doing, I have a gut
> feeling that a different distributed system might be a better fit for this
> specific task. I assume, you are dealing with very large graphs if you are
> using Hadoop, and you want grid processing. But the linear nature of
> Map/Reduce may make it hard to fit. As the MapReduce paper said, not every
> task can be easily expressed this way.
>
> The other technology I mean is JavaSpaces, of which I usually use the
> GigaSpaces implementation. This allows more complex algorithms. You will
> store your complete graph as an appropriate structure in a JavaSpace, and
> will also restructure it for parallel processing, as outlined in some
> JavaSpaces books. Then you can have as many workers as you want, working on
> individual nodes.
>
> Mark
>
> On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki  wrote:
>
> > Hi,
> >
> > I'm looking for an advice. I need to process a directed graph encoded as
> a
> > list of  pairs. The goal is to compute a list of longest paths
> in
> > the graph. There is no guarantee that the graph is acyclic, so there
> should
> > be some mechanism to detect cycles.
> >
> > Currently I'm using a simple approach consisting of the following: I
> encode
> > the graph as >, and extending the
> > paths by one degree at a time. This means that in order to find the
> longest
> > path of degree N it takes N + 1 map-reduce jobs.
> >
> > Are you perhaps aware of a smarter way to do it? I would appreciate any
> > pointers.
> >
> > --
> > Best regards,
> > Andrzej Bialecki <><
> >  ___. ___ ___ ___ _ _   __
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
> >
>



-- 
http://blog.lukas-vlcek.com/


Re: ApacheCon US 2008

2008-10-31 Thread Lukáš Vlček
Hi,
Hope somebody will record at least fraction of these talks and put them on
the web as soon as possible.Lukas

On Fri, Oct 31, 2008 at 6:46 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:

> Just a reminder that ApacheCon US is next week in New Orleans. There will
> be a lot of Hadoop developers and talks. (I'm CC'ing core-user because it
> has the widest coverage. Please join the low traffic [EMAIL PROTECTED] list
> for cross sub-project announcements.)
>
>* Hadoop Camp with lots of talks about Hadoop
>  o Introduction to Hadoop by Owen O'Malley
>  o A Tour of Apache Hadoop by Tom White
>  o Programming Hadoop Map/Reduce by Arun Murthy
>  o Hadoop at Yahoo! by Eric Baldeschwieler
>  o Hadoop Futures Panel
>  o Using Hadoop for an Intranet Seach Engine by Shivakumar
> Vaithyanthan
>  o Cloud Computing Testbed by Thomas Sandholm
>  o Improving Virtualization and Performance Tracing of Hadoop with
> Open Solaris
>  by George Porter
>  o An Insight into Hadoop Usage at Facebook by Dhruba Borthakur
>  o Pig by Alan Gates
>  o Zookeeper, Coordinating the Distributed Application by Ben Reed
>  o Querying JSON Data on Hadoop using Jaql by Kevin Beyer
>  o HBase by Michael Stack
>* Hadoop training on Practical Problem Solving in Hadoop
>* Cloudera is providing a test Hadoop cluster and a Hadoop hacking
> contest.
>
> There is also a new Hadoop tutorial available.
>
> -- Owen




-- 
http://blog.lukas-vlcek.com/


Re: Official group blog of the hadoop user/dev group?

2008-10-08 Thread Lukáš Vlček
Hi,

Well, not a bad idea I think. But isn't wiki a better tool to catch and
shape collective knowledge?
Lukas

On Wed, Oct 8, 2008 at 5:39 PM, Steve Loughran <[EMAIL PROTECTED]> wrote:

> Edward J. Yoon wrote:
>
>> If we have a group blog of the hadoop user/dev group such as a Y!
>> developer network, we can easily share/introduce our experience and
>> outcomes from our research. So, I thought about a group blog, I guess
>> there are plenty of contributors.
>>
>> What do you think about it?
>>
>
> there's the planetapache aggregator; you could join that or set up a
> similar 'planet' up to pull in the different feeds of different people
>



-- 
http://blog.lukas-vlcek.com/


Re: graphics in hadoop

2008-10-07 Thread Lukáš Vlček
Hi,

Hadoop is a platform for distributed computing. Typically it runs on a
cluster of dedicated servers (though expensive HW is not required), as far
as I know it is not mean to be a platform for applications running on
client.
Hadoop is very general and not limitted by nature of the data, this means
that you should be able to process also image data.

Regards,
Lukas

On Tue, Oct 7, 2008 at 10:51 AM, chandra <
[EMAIL PROTECTED]> wrote:

>
>
> hi
>
> does hadoop support graphics packages for displaying some images..?
>
>
> --
> Best Regards
> S.Chandravadana
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by reply
> e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on this
> e-mail is strictly
> prohibited and may be unlawful.
>


Re: Hadoop Book

2008-08-28 Thread Lukáš Vlček
Tom,

Do you think you could drop a small note into this list once it is
available?

Lukas

2008/8/28 Tom White <[EMAIL PROTECTED]>

> That's right, I'm writing a book on Hadoop for O'Reilly. It will be a
> part of the Rough Cuts program (http://oreilly.com/roughcuts/), which
> means it'll be available as writing progresses.
>
> Tom
>
> 2008/8/28 Lukáš Vlček <[EMAIL PROTECTED]>:
> > BTW: I found (http://skillsmatter.com/custom/presentations/ec2-talk.pdf)
> > that Tom White is working on Hadoop book now.
> >
> > Lukas
> >
> > 2008/8/26 Feris Thia <[EMAIL PROTECTED]>
> >
> >> Hi Lukas,
> >>
> >> I've check on Youtube.. and yes, there are many explanations on Hadoop.
> >>
> >> Thanks for your guide :)
> >>
> >> Regards,
> >>
> >> Feris
> >>
> >> On Tue, Aug 26, 2008 at 1:39 AM, Lukáš Vlček <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > As far as I know, there is no Hadoop specific book yet. However; you
> can
> >> > find several interesting video presentations from Google or Yahoo!
> Hadoop
> >> > meetings. There are good tutorials on the net as well as several
> >> > interesting
> >> > blog posts (sevearl people involved in Hadoop development do regularly
> >> blog
> >> > about Hadoop) and you can read user and dev mail lists (and you can
> also
> >> > ask
> >> > questions there! - you can not do this with the book).
> >> >
> >> > On the other hand Hadoop is under development and as a such API can
> >> change
> >> > and new fatures can be added every day. Hadoop is not settled down the
> >> same
> >> > way the Oracle is now. But I am *sure* the book about Hadoop is
> comming
> >> in
> >> > the future because there is a demand...
> >> >
> >> > Regards,
> >> > Lukas
> >> >
> >> >
> >> > --
> >> > http://blog.lukas-vlcek.com/
> >> >
> >>
> >
> >
> >
> > --
> > http://blog.lukas-vlcek.com/
> >
>



-- 
http://blog.lukas-vlcek.com/


Re: Hadoop Book

2008-08-28 Thread Lukáš Vlček
BTW: I found (http://skillsmatter.com/custom/presentations/ec2-talk.pdf)
that Tom White is working on Hadoop book now.

Lukas

2008/8/26 Feris Thia <[EMAIL PROTECTED]>

> Hi Lukas,
>
> I've check on Youtube.. and yes, there are many explanations on Hadoop.
>
> Thanks for your guide :)
>
> Regards,
>
> Feris
>
> On Tue, Aug 26, 2008 at 1:39 AM, Lukáš Vlček <[EMAIL PROTECTED]>
> wrote:
>
> > Hi,
> >
> > As far as I know, there is no Hadoop specific book yet. However; you can
> > find several interesting video presentations from Google or Yahoo! Hadoop
> > meetings. There are good tutorials on the net as well as several
> > interesting
> > blog posts (sevearl people involved in Hadoop development do regularly
> blog
> > about Hadoop) and you can read user and dev mail lists (and you can also
> > ask
> > questions there! - you can not do this with the book).
> >
> > On the other hand Hadoop is under development and as a such API can
> change
> > and new fatures can be added every day. Hadoop is not settled down the
> same
> > way the Oracle is now. But I am *sure* the book about Hadoop is comming
> in
> > the future because there is a demand...
> >
> > Regards,
> > Lukas
> >
> >
> > --
> > http://blog.lukas-vlcek.com/
> >
>



-- 
http://blog.lukas-vlcek.com/


Re: Hadoop Book

2008-08-25 Thread Lukáš Vlček
Hi,

As far as I know, there is no Hadoop specific book yet. However; you can
find several interesting video presentations from Google or Yahoo! Hadoop
meetings. There are good tutorials on the net as well as several interesting
blog posts (sevearl people involved in Hadoop development do regularly blog
about Hadoop) and you can read user and dev mail lists (and you can also ask
questions there! - you can not do this with the book).

On the other hand Hadoop is under development and as a such API can change
and new fatures can be added every day. Hadoop is not settled down the same
way the Oracle is now. But I am *sure* the book about Hadoop is comming in
the future because there is a demand...

Regards,
Lukas

On Mon, Aug 25, 2008 at 7:14 PM, Feris Thia <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I'm a beginner to Hadoop. Is there any good book on this technology ?
>
> Regards,
>
> Feris
>



-- 
http://blog.lukas-vlcek.com/


Re: Setting up a Hadoop cluster where nodes are spread over the Internet

2008-08-08 Thread Lukáš Vlček
HI,

I am not an expert on Hadoop configuration but is this safe? As far as I
understand the IP address is public and connection to the datanode port is
not secured. Am I correct?

Lukas

On Fri, Aug 8, 2008 at 8:35 AM, Lucas Nazário dos Santos <
[EMAIL PROTECTED]> wrote:

> Hello again,
>
> In fact I can get the cluster up and running with two nodes in different
> LANs. The problem appears when executing a job.
>
> As you can see in the piece of log bellow, the datanode tries to comunicate
> with the namenode using the IP 10.1.1.5. The issue is that the datanode
> should be using a valid IP, and not 10.1.1.5.
>
> Is there a way of manually configuring the datanode with the namenode's IP,
> so I can change from 10.1.1.5 to, say 189.11.131.172?
>
> Thanks,
> Lucas
>
>
> 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker:
> TaskTracker up at: localhost/127.0.0.1:60394
> 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker: Starting
> tracker tracker_localhost:localhost/127.0.0.1:60394
> 2008-08-08 02:34:23,589 INFO org.apache.hadoop.mapred.TaskTracker: Starting
> thread: Map-events fetcher for all reduce tasks on
> tracker_localhost:localhost/127.0.0.1:60394
> 2008-08-08 03:06:43,239 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction: task_200808080234_0001_m_00_0
> 2008-08-08 03:07:43,989 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /10.1.1.5:9000. Already tried 1 time(s).
> 2008-08-08 03:08:44,999 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /10.1.1.5:9000. Already tried 2 time(s).
> 2008-08-08 03:09:45,999 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /10.1.1.5:9000. Already tried 3 time(s).
> 2008-08-08 03:10:47,009 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /10.1.1.5:9000. Already tried 4 time(s).
> 2008-08-08 03:11:48,009 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /10.1.1.5:9000. Already tried 5 time(s).
> 2008-08-08 03:12:49,026 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /10.1.1.5:9000. Already tried 6 time(s).
> 2008-08-08 03:13:50,036 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /10.1.1.5:9000. Already tried 7 time(s).
> 2008-08-08 03:14:51,046 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /10.1.1.5:9000. Already tried 8 time(s).
> 2008-08-08 03:15:52,056 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /10.1.1.5:9000. Already tried 9 time(s).
> 2008-08-08 03:16:53,066 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /10.1.1.5:9000. Already tried 10 time(s).
> 2008-08-08 03:17:54,077 WARN org.apache.hadoop.mapred.TaskTracker: Error
> initializing task_200808080234_0001_m_00_0:
> java.net.SocketTimeoutException
>at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109)
>at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
>at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
>at org.apache.hadoop.ipc.Client.call(Client.java:546)
>at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
>at org.apache.hadoop.dfs.$Proxy5.getProtocolVersion(Unknown Source)
>at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:313)
>at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:102)
>at org.apache.hadoop.dfs.DFSClient.(DFSClient.java:178)
>at
>
> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:68)
>at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1280)
>at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
>at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1291)
>at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:203)
>at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:152)
>at
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:670)
>at
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
>at
> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
>at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
>at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)
>
>
>
> On Fri, Aug 8, 2008 at 12:16 AM, Lucas Nazário dos Santos <
> [EMAIL PROTECTED]> wrote:
>
> > Hello,
> >
> > Can someone point me out what are the extra tasks that need to be
> performed
> > in order to set up a cluster where nodes are spread over the Internet, in
> > different LANs?
> >
> > Do I need to free any datanode/namenode ports? How do I get the datanodes
> > to know the valid namenode IP, and not something like 10.1.1.1?
> >
> > Any help is appreciate.
> >
> > Lucas
> >
>



-- 
http://blog.lukas-vlcek.com/