Thanks for your time. I have tweeted about the graph db formats, I know some of my followers are working with them, so they might be interested.
Am 25. März 2012 19:25 schrieb Praveen Sripati <[email protected]>: > I have created Umbrella JIRA HAMA-536 for creating the > InputFormats/OutputFormats with three sub-tasks. For now I have assigned > the tasks to me, let me know if anyone is interested. > > Praveen > > On Sun, Mar 25, 2012 at 6:40 PM, Thomas Jungblut < > [email protected]> wrote: > > > > > > > I can open a JIRA. I need input on what all InputFormat makes sense and > > the > > > their priority. Some we can port from Hadoop. > > > > > > Yep, you're right. I guess a single JIRA would be enough for the already > > implemented formats in Hadoop, for the others we need subclasses. > > Formats that I really wanted to have would be: > > > > - DBInputFormat[1] > > - XMLInputFormat > > - NLineInputFormat > > - CSVInputFormat (we could use OpenCSV for that in conjunction with > > TextInputFormat) > > - JSONInputFormat (for OpenGraph stuff) > > - The graph DB formats Neo4J and how the others are called > > > > Anything I missed for a "full" coverage? > > > > Could you please elaborate on this? > > > > > > Sure, DMOZ is some kind of crawled website database. It is used in some > > pagerank examples to test it, don't know if it was in Mahout. We could > also > > use it since we have pagerank as well. > > CommonCrawl is a new up-coming DMOZ-like database of many crawled sites, > it > > is hosted on S3 in Amazon Cloud. We run on EC2 via Whirr so this could > be a > > cool example as well. > > > > [1] > > > > > http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.html > > > > > > Am 25. März 2012 14:56 schrieb Praveen Sripati <[email protected] > >: > > > > > Thomas et al, > > > > > > > Would someone please open JIRAs for that? > > > > > > I can open a JIRA. I need input on what all InputFormat makes sense and > > the > > > their priority. Some we can port from Hadoop. > > > > > > > Based on XML we can implement a format that parses DMOZ or > commoncrawl > > on > > > Amzon S3. > > > > > > Could you please elaborate on this? > > > > > > Praveen > > > > > > > > > On Sun, Mar 25, 2012 at 5:14 PM, Chia-Hung Lin <[email protected] > > > >wrote: > > > > > > > As I understand, many iterative applications don't require key value > > > > input/ output and additionally need random access (read/ write) to > > > > particular file. I/O interface e.g. mpi may increase flexibility > here. > > > > > > > > https://issues.apache.org/jira/browse/MAPREDUCE-2911 > > > > > > > > On 25 March 2012 10:01, Praveen Sripati <[email protected]> > > > wrote: > > > > > Hi, > > > > > > > > > > For Hama there are limited input formats > > > > > > > > > > CombineFileInputFormat, FileInputFormat, NullInputFormat, > > > > > SequenceFileInputFormat, TextInputFormat > > > > > > > > > > Does it make sense to have to have more input formats? I was > thinking > > > > > InputFormats for Graph Databases. > > > > > > > > > > Any feedback for the different input formats is welcome. > > > > > > > > > > I quickly glanced Giraph and Hadoop and they have more InputFormats > > > which > > > > > makes it easy to plug them with external systems. > > > > > > > > > > Praveen > > > > > > > > > > > > > > > -- > > Thomas Jungblut > > Berlin <[email protected]> > > > -- Thomas Jungblut Berlin <[email protected]>
