Re: InputFormats for Hama

Edward J. Yoon Tue, 27 Mar 2012 22:55:59 -0700

Nice discussion!

BTW, Anyone interested in contributing HBase table input/output formatters?


On Mon, Mar 26, 2012 at 2:27 AM, Thomas Jungblut
<[email protected]> wrote:
> Thanks for your time.
> I have tweeted about the graph db formats, I know some of my followers are
> working with them, so they might be interested.
>
> Am 25. März 2012 19:25 schrieb Praveen Sripati <[email protected]>:
>
>> I have created Umbrella JIRA HAMA-536 for creating the
>> InputFormats/OutputFormats with three sub-tasks. For now I have assigned
>> the tasks to me, let me know if anyone is interested.
>>
>> Praveen
>>
>> On Sun, Mar 25, 2012 at 6:40 PM, Thomas Jungblut <
>> [email protected]> wrote:
>>
>> > >
>> > > I can open a JIRA. I need input on what all InputFormat makes sense and
>> > the
>> > > their priority. Some we can port from Hadoop.
>> >
>> >
>> > Yep, you're right. I guess a single JIRA would be enough for the already
>> > implemented formats in Hadoop, for the others we need subclasses.
>> > Formats that I really wanted to have would be:
>> >
>> >   - DBInputFormat[1]
>> >   - XMLInputFormat
>> >   - NLineInputFormat
>> >   - CSVInputFormat (we could use OpenCSV for that in conjunction with
>> >   TextInputFormat)
>> >   - JSONInputFormat (for OpenGraph stuff)
>> >   - The graph DB formats Neo4J and how the others are called
>> >
>> > Anything I missed for a "full" coverage?
>> >
>> > Could you please elaborate on this?
>> >
>> >
>> > Sure, DMOZ is some kind of crawled website database. It is used in some
>> > pagerank examples to test it, don't know if it was in Mahout. We could
>> also
>> > use it since we have pagerank as well.
>> > CommonCrawl is a new up-coming DMOZ-like database of many crawled sites,
>> it
>> > is hosted on S3 in Amazon Cloud. We run on EC2 via Whirr so this could
>> be a
>> > cool example as well.
>> >
>> > [1]
>> >
>> >
>> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.html
>> >
>> >
>> > Am 25. März 2012 14:56 schrieb Praveen Sripati <[email protected]
>> >:
>> >
>> > > Thomas et al,
>> > >
>> > > > Would someone please open JIRAs for that?
>> > >
>> > > I can open a JIRA. I need input on what all InputFormat makes sense and
>> > the
>> > > their priority. Some we can port from Hadoop.
>> > >
>> > > > Based on XML we can implement a format that parses DMOZ or
>> commoncrawl
>> > on
>> > > Amzon S3.
>> > >
>> > > Could you please elaborate on this?
>> > >
>> > > Praveen
>> > >
>> > >
>> > > On Sun, Mar 25, 2012 at 5:14 PM, Chia-Hung Lin <[email protected]
>> > > >wrote:
>> > >
>> > > > As I understand, many iterative applications don't require key value
>> > > > input/ output and additionally need random access (read/ write) to
>> > > > particular file. I/O interface e.g. mpi may increase flexibility
>> here.
>> > > >
>> > > > https://issues.apache.org/jira/browse/MAPREDUCE-2911
>> > > >
>> > > > On 25 March 2012 10:01, Praveen Sripati <[email protected]>
>> > > wrote:
>> > > > > Hi,
>> > > > >
>> > > > > For Hama there are limited input formats
>> > > > >
>> > > > > CombineFileInputFormat, FileInputFormat, NullInputFormat,
>> > > > > SequenceFileInputFormat, TextInputFormat
>> > > > >
>> > > > > Does it make sense to have to have more input formats? I was
>> thinking
>> > > > > InputFormats for Graph Databases.
>> > > > >
>> > > > > Any feedback for the different input formats is welcome.
>> > > > >
>> > > > > I quickly glanced Giraph and Hadoop and they have more InputFormats
>> > > which
>> > > > > makes it easy to plug them with external systems.
>> > > > >
>> > > > > Praveen
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thomas Jungblut
>> > Berlin <[email protected]>
>> >
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <[email protected]>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: InputFormats for Hama

Reply via email to