Hey, besides HBase you can use SequenceFiles, they have Key/Value pairs. So normally you use somekind of <VectorWritable, NullWritable> pairs, VectorWritable is for example in mahout. They have a good math package for sparse and dense vectors.
If you don't want vector classes then you can use ArrayWritable for dense and MapWritable for sparse data. It depends also on what you're doing with your data, so if you have more information about the algorithm, we can give you a better suggestion ;) Am 28. März 2012 00:51 schrieb Edward J. Yoon <[email protected]>: > Hi, > > I believe that HBase is the best way to store multi-dimensional > arrays. HBase provides storage efficiencies as number of dimensions > grow, ordering capability, and also allows you to record and access > data corrections and updates directly via HBase client library. > > Another option is use of SequenceFile and MapFile. Once data loaded to > the program initially, your math operations can run directly in memory > and and synchronized using a standard BSP APIs. > > Thanks. > > On Wed, Mar 28, 2012 at 12:46 AM, Noah Watkins <[email protected]> > wrote: > > Hi Hama list, > > > > I'm interested in using Hama to process large multi-dimensional arrays > (sparse and dense). What is the best way to store and represent this type > of data for processing in Hama? > > > > Thanks, > > Noah > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon > -- Thomas Jungblut Berlin <[email protected]>
