GitHub user xiaopeng-liao opened a pull request: https://github.com/apache/phoenix/pull/196
[PHOENIX-2648] Add dynamic column support for spark integration It supports both RDD and Dataframe read /write, Things needed consideration ====== When loading from Dataframe, there is a need to convert from catalyst data type to Phoenix type, ex. StringType to Varchar, Array<Integer> to INTEGER_ARRAY,. etc. The code is under phoenix-spark/src/main/scala/org.apache.phoenix.spark.DataFrameFunctions.scala Usages ======= - **RDD** **Save** ``` val dataSet = List((1L, "1", 1, 1), (2L, "2", 2, 2), (3L, "3", 3, 3)) sc .parallelize(dataSet) .saveToPhoenix( "OUTPUT_TEST_TABLE", Seq("ID", "COL1", "COL2", "COL4<INTEGER"), hbaseConfiguration ) ``` **Read** ``` val columnNames = Seq("ID", "COL1", "COL2", "COL5<INTEGER") // Load the results back val loaded = sc.phoenixTableAsRDD( "OUTPUT_TEST_TABLE",columnNames, conf = hbaseConfiguration ) ``` - **Dataframe** **Save** It will get data types from Dataframe and convert to Phoenix supported types ``` val dataSet = List((1L, "1", 1, 1,"2"), (2L, "2", 2, 2,"3"), (3L, "3", 3, 3,"4")) sc .parallelize(dataSet).toDF("ID","COL1","COL2","COL6","COL7") .saveToPhoenix("OUTPUT_TEST_TABLE",zkUrl = Some(quorumAddress)) ``` **Read** ``` val df1 = sqlContext.phoenixTableAsDataFrame("OUTPUT_TEST_TABLE", Array("ID", "COL1","COL6<INTEGER", "COL7<VARCHAR"), conf = hbaseConfiguration) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xiaopeng-liao/phoenix phoenix-addsparkdynamic Alternatively you can review and apply these changes as the patch at: https://github.com/apache/phoenix/pull/196.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #196 ---- commit a2dc6101d96333f781ff9e905c47c035f8b89462 Author: xiaopeng liao <xiaopeng liao> Date: 2016-08-17T12:13:58Z add dynamic column support for SPARK rdd commit 6969287db5ea341bc3876af55f7d0ef3acb035c2 Author: xiaopeng liao <xiaopeng liao> Date: 2016-08-18T09:46:38Z add dynamic column support for reading from PhoenixRDD. commit 5688b6c90c66b02cc22fcac6e67b9712d7eb660e Author: xiaopeng-liao <xp.em.l...@gmail.com> Date: 2016-08-19T14:52:27Z Merge pull request #1 from apache/master merge in latest changes from phoenix commit a9b217e55393f613e9ca168faccd93e7626c7324 Author: xiaopeng liao <xiaopeng liao> Date: 2016-08-23T10:51:34Z [PHOENIX-2648] add support for dynamic columns for RDD and Dataframe commit 51190865375397581cbd1d6b960c79be7d727b97 Author: xiaopeng liao <xiaopeng liao> Date: 2016-08-23T10:52:27Z Merge branch 'phoenix-addsparkdynamic' of https://github.com/xiaopeng-liao/phoenix into phoenix-addsparkdynamic commit 6cbd6314782a6eb1a4c69eae25371791e4d64f90 Author: xiaopeng liao <xiaopeng liao> Date: 2016-08-23T13:00:55Z Remove the configuration for enable dynamic column as it is not used anyway commit 8602554c875229f376499c082894cc33999f3e7b Author: xiaopeng liao <xiaopeng liao> Date: 2016-08-23T15:01:29Z More clean up, remove the configuration for dynamic column commit d3a4f1575f4b376df32f6d28aeba14270ce58088 Author: xiaopeng liao <xiaopeng liao> Date: 2016-08-25T08:44:47Z [PHOENIX-2648] change dynamic column format from COL:DataType to COL<DataType becaues it conflict with index syntax ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---