Thanks for the answer Dmitriy. I am interested in implementing this feature. My thought on how to do it is as follows:
- abstract ParquetFileWriter -> ParquetDataWriter. writeDictionaryPage, writeDataPage are abstract methods. - ParquetFileWriter implements ParquetDataWriter, writing the data to Hadoop compatible files. - ParquetCassandraWriter implements ParquetDataWriter, writing data to Cassandra -- for each page, metadata is written to Metadata CF, with key <parquet-file-name>:<row-chunk>:<column>:<page> -- for each page, data is written to Data CF, with key <parquet-file-name>:<row-chunk>:<column>:<page> -- footer is written to Metadata CF, with key <parquet-file-name> - abstract ParquetFileReader -> ParquetDataReader. readNextRowGroup, readFooter are abstract methods. Chunk will also need to be abstract. - ParquetFileReader implements ParquetDataReader, reading from Hadoop compatible files. - ParquetCassandraReader implements ParquetDataReader, reading from Cassandra - ParquetDataWriter and ParquetDataReader are instantiated through reflection? (that way users can implement and use arbitrary readers and writers) Does this make sense? Thanks! On Wed, Mar 18, 2015 at 9:30 AM, Dmitriy Ryaboy <[email protected]> wrote: > That's an interesting thought. No one has done this that I am aware of but > I'd be curious to see what the results of doing this are. > > You could approximate something like this using the Cassandra fs > implementation -- a few treats ago there was a project that allowed one to > use Cassandra as the "file system" for hadoop, like people use s3. Not sure > if that's still supported. > > On Wednesday, March 18, 2015, Issac Buenrostro <[email protected] > > > wrote: > > > Hello, > > > > Is there a way to write Parquet records to Cassandra? So far I have only > > found logic for writing to Hadoop compatible filesystems. > > > > I could see each page written to a different cell in Cassandra, with file > > metadata and page headers written to separate cells in a different column > > family. That way, we could leverage large Cassandra clusters, getting > > highly parallel writes and low latency reads. > > > > Thank you. > > Issac > > >
