Great Praveen! On Wed, Mar 28, 2012 at 10:33 AM, Praveen Sripati <[email protected]> wrote: > Ed, > > After I have done porting Hadoop formats to Hama, I can work on it. > > I have created a sub-task HAMA-544 for HBase InputFormat. > > Praveen > > On Wed, Mar 28, 2012 at 4:33 AM, Edward J. Yoon <[email protected]>wrote: > >> Nice discussion! >> >> BTW, Anyone interested in contributing HBase table input/output formatters? >> >> On Mon, Mar 26, 2012 at 2:27 AM, Thomas Jungblut >> <[email protected]> wrote: >> > Thanks for your time. >> > I have tweeted about the graph db formats, I know some of my followers >> are >> > working with them, so they might be interested. >> > >> > Am 25. März 2012 19:25 schrieb Praveen Sripati <[email protected] >> >: >> > >> >> I have created Umbrella JIRA HAMA-536 for creating the >> >> InputFormats/OutputFormats with three sub-tasks. For now I have assigned >> >> the tasks to me, let me know if anyone is interested. >> >> >> >> Praveen >> >> >> >> On Sun, Mar 25, 2012 at 6:40 PM, Thomas Jungblut < >> >> [email protected]> wrote: >> >> >> >> > > >> >> > > I can open a JIRA. I need input on what all InputFormat makes sense >> and >> >> > the >> >> > > their priority. Some we can port from Hadoop. >> >> > >> >> > >> >> > Yep, you're right. I guess a single JIRA would be enough for the >> already >> >> > implemented formats in Hadoop, for the others we need subclasses. >> >> > Formats that I really wanted to have would be: >> >> > >> >> > - DBInputFormat[1] >> >> > - XMLInputFormat >> >> > - NLineInputFormat >> >> > - CSVInputFormat (we could use OpenCSV for that in conjunction with >> >> > TextInputFormat) >> >> > - JSONInputFormat (for OpenGraph stuff) >> >> > - The graph DB formats Neo4J and how the others are called >> >> > >> >> > Anything I missed for a "full" coverage? >> >> > >> >> > Could you please elaborate on this? >> >> > >> >> > >> >> > Sure, DMOZ is some kind of crawled website database. It is used in >> some >> >> > pagerank examples to test it, don't know if it was in Mahout. We could >> >> also >> >> > use it since we have pagerank as well. >> >> > CommonCrawl is a new up-coming DMOZ-like database of many crawled >> sites, >> >> it >> >> > is hosted on S3 in Amazon Cloud. We run on EC2 via Whirr so this could >> >> be a >> >> > cool example as well. >> >> > >> >> > [1] >> >> > >> >> > >> >> >> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.html >> >> > >> >> > >> >> > Am 25. März 2012 14:56 schrieb Praveen Sripati < >> [email protected] >> >> >: >> >> > >> >> > > Thomas et al, >> >> > > >> >> > > > Would someone please open JIRAs for that? >> >> > > >> >> > > I can open a JIRA. I need input on what all InputFormat makes sense >> and >> >> > the >> >> > > their priority. Some we can port from Hadoop. >> >> > > >> >> > > > Based on XML we can implement a format that parses DMOZ or >> >> commoncrawl >> >> > on >> >> > > Amzon S3. >> >> > > >> >> > > Could you please elaborate on this? >> >> > > >> >> > > Praveen >> >> > > >> >> > > >> >> > > On Sun, Mar 25, 2012 at 5:14 PM, Chia-Hung Lin < >> [email protected] >> >> > > >wrote: >> >> > > >> >> > > > As I understand, many iterative applications don't require key >> value >> >> > > > input/ output and additionally need random access (read/ write) to >> >> > > > particular file. I/O interface e.g. mpi may increase flexibility >> >> here. >> >> > > > >> >> > > > https://issues.apache.org/jira/browse/MAPREDUCE-2911 >> >> > > > >> >> > > > On 25 March 2012 10:01, Praveen Sripati <[email protected] >> > >> >> > > wrote: >> >> > > > > Hi, >> >> > > > > >> >> > > > > For Hama there are limited input formats >> >> > > > > >> >> > > > > CombineFileInputFormat, FileInputFormat, NullInputFormat, >> >> > > > > SequenceFileInputFormat, TextInputFormat >> >> > > > > >> >> > > > > Does it make sense to have to have more input formats? I was >> >> thinking >> >> > > > > InputFormats for Graph Databases. >> >> > > > > >> >> > > > > Any feedback for the different input formats is welcome. >> >> > > > > >> >> > > > > I quickly glanced Giraph and Hadoop and they have more >> InputFormats >> >> > > which >> >> > > > > makes it easy to plug them with external systems. >> >> > > > > >> >> > > > > Praveen >> >> > > > >> >> > > >> >> > >> >> > >> >> > >> >> > -- >> >> > Thomas Jungblut >> >> > Berlin <[email protected]> >> >> > >> >> >> > >> > >> > >> > -- >> > Thomas Jungblut >> > Berlin <[email protected]> >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >>
-- Best Regards, Edward J. Yoon @eddieyoon
