Hi Edmon, I would start with picking one of Avro, Thrift or Protobuf to describe a schema for this data: http://avro.apache.org/docs/current/#schemas https://developers.google.com/protocol-buffers/ http://thrift.apache.org/docs/idl
>From there you can write to Parquet using the appropriate integration: https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/test/java/org/apache/parquet/avro/TestSpecificReadWrite.java https://github.com/apache/parquet-mr/blob/master/parquet-protobuf/src/test/java/org/apache/parquet/proto/ProtoInputOutputFormatTest.java https://github.com/apache/parquet-mr/blob/master/parquet-thrift/src/test/java/org/apache/parquet/hadoop/thrift/TestInputOutputFormat.java Julien On Thu, Aug 27, 2015 at 7:23 PM, Edmon Begoli <[email protected]> wrote: > This might be more of a question for Parquet folks here than Drill-ers, but > nevertheless: > > I would like to be able to convert EDI HL7 v.2 messages into Parquet > representation, and make them amenable to Drill querying. > (Here is a sample claim message 837p in HL7 representation (page 8): > http://www.vitahealth.org/Modules/ShowDocument2.aspx?documentid=545 ) > > This is a lengthy topic which I could discuss in details, but for now I > would like to just know where and how to get started. > > Thank you, > Edmon > -- Julien
