Re: Writing parquet to Cassandra

Issac Buenrostro Wed, 18 Mar 2015 19:04:47 -0700

Thanks for the answer Dmitriy. I am interested in implementing this
feature. My thought on how to do it is as follows:


- abstract ParquetFileWriter -> ParquetDataWriter. writeDictionaryPage,
writeDataPage are abstract methods.
- ParquetFileWriter implements ParquetDataWriter, writing the data to
Hadoop compatible files.
- ParquetCassandraWriter implements ParquetDataWriter, writing data to
Cassandra
-- for each page, metadata is written to Metadata CF, with key
<parquet-file-name>:<row-chunk>:<column>:<page>
-- for each page, data is written to Data CF, with key
<parquet-file-name>:<row-chunk>:<column>:<page>
-- footer is written to Metadata CF, with key <parquet-file-name>

- abstract ParquetFileReader -> ParquetDataReader. readNextRowGroup,
readFooter are abstract methods. Chunk will also need to be abstract.
- ParquetFileReader implements ParquetDataReader, reading from Hadoop
compatible files.
- ParquetCassandraReader implements ParquetDataReader, reading from
Cassandra

- ParquetDataWriter and ParquetDataReader are instantiated through
reflection? (that way users can implement and use arbitrary readers and
writers)

Does this make sense?
Thanks!

On Wed, Mar 18, 2015 at 9:30 AM, Dmitriy Ryaboy <[email protected]> wrote:

> That's an interesting thought. No one has done this that I am aware of but
> I'd be curious to see what the results of doing this are.
>
> You could approximate something like this using the Cassandra fs
> implementation --  a few treats ago there was a project that allowed one to
> use Cassandra as the "file system" for hadoop, like people use s3. Not sure
> if that's still supported.
>
> On Wednesday, March 18, 2015, Issac Buenrostro <[email protected]
> >
> wrote:
>
> > Hello,
> >
> > Is there a way to write Parquet records to Cassandra? So far I have only
> > found logic for writing to Hadoop compatible filesystems.
> >
> > I could see each page written to a different cell in Cassandra, with file
> > metadata and page headers written to separate cells in a different column
> > family. That way, we could leverage large Cassandra clusters, getting
> > highly parallel writes and low latency reads.
> >
> > Thank you.
> > Issac
> >
>

Re: Writing parquet to Cassandra

Reply via email to