Re: pyspark get column family and qualifier names from hbase table
Hi, This is my code, import org.apache.hadoop.hbase.CellUtil /** * JF: convert a Result object into a string with column family and qualifier names. Sth like * 'columnfamily1:columnqualifier1:value1;columnfamily2:columnqualifier2:value2' etc. * k-v pairs are separated by ';'. different columns for each cell is separated by ':'. * Notice that we don't need the row key here, because it has been converted by * ImmutableBytesWritableToStringConverter. */ class CustomHBaseResultToStringConverter extends Converter[Any, String] { override def convert(obj: Any): String = { val result = obj.asInstanceOf[Result] result.rawCells().map(cell = List(Bytes.toString(CellUtil.cloneFamily(cell)), Bytes.toString(CellUtil.cloneQualifier(cell)), Bytes.toString(CellUtil.cloneValue(cell))).mkString(:)).mkString(;) } } I recommend you to use different delimiters (to replace : or ; ) if you have data with those stuff in them. I am not a seasoned scala programmer, so there might be a more flexible solution. For example, make the delimiters dynamically assignable. I will try to open a PR probably later today. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18744.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: pyspark get column family and qualifier names from hbase table
Hi Nick, I saw the HBase api has experienced lots of changes. If I remember correctly, the default hbase in spark 1.1.0 is 0.94.6. The one I am using is 0.98.1. To get the column family names and qualifier names, we need to call different methods for these two different versions. I don't know how to do that...sorry... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18749.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: pyspark get column family and qualifier names from hbase table
checked the source, found the following, class HBaseResultToStringConverter extends Converter[Any, String] { override def convert(obj: Any): String = { val result = obj.asInstanceOf[Result] Bytes.toStringBinary(result.value()) } } I feel using 'result.value()' here is a big limitation. Converting from the 'list()' from the 'Result' is more general and easy to use. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18619.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: pyspark get column family and qualifier names from hbase table
just wrote a custom convert in scala to replace HBaseResultToStringConverter. Just couple of lines of code. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18639.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: pyspark get column family and qualifier names from hbase table
Hey freedafeng, I'm exactly where you are. I want the output to show the rowkey and all column qualifiers that correspond to it. How did you write HBaseResultToStringConverter to do what you wanted it to do? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18650.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: pyspark get column family and qualifier names from hbase table
Feel free to add that converter as an option in the Spark examples via a PR :) — Sent from Mailbox On Wed, Nov 12, 2014 at 3:27 AM, alaa contact.a...@gmail.com wrote: Hey freedafeng, I'm exactly where you are. I want the output to show the rowkey and all column qualifiers that correspond to it. How did you write HBaseResultToStringConverter to do what you wanted it to do? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18650.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org