Re: Spark Phoenix Plugin

Benjamin Kim Sat, 20 Feb 2016 07:44:39 -0800

Josh,

My production environment at our company is:
CDH 5.4.8
Hadoop 2.6.0-cdh5.4.8
YARN 2.6.0-cdh5.4.8
HBase 1.0.0-cdh5.4.8
Apache
HBase 1.1.3
Spark 1.6.0
Phoenix 4.7.0


I tried to use the Phoenix Spark Plugin against both versions of HBase.

I hope this helps.

Thanks,
Ben


> On Feb 20, 2016, at 7:37 AM, Josh Mahonin <jmaho...@gmail.com> wrote:
> 
> Hi Ben,
> 
> Can you describe in more detail what your environment is? Are you using stock 
> installs of HBase, Spark and Phoenix? Are you using the hadoop2.4 pre-built 
> Spark distribution as per the documentation [1]?
> 
> The unread block data error is commonly traced back to this issue [2] which 
> indicates some sort of mismatched version problem..
> 
> Thanks,
> 
> Josh
> 
> [1] https://phoenix.apache.org/phoenix_spark.html 
> <https://phoenix.apache.org/phoenix_spark.html>
> [2] https://issues.apache.org/jira/browse/SPARK-1867 
> <https://issues.apache.org/jira/browse/SPARK-1867>
> 
> On Fri, Feb 19, 2016 at 2:18 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Josh,
> 
> When I run the following code in spark-shell for spark 1.6:
> 
> import org.apache.phoenix.spark._
> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
> "TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181"))
> df.select(df("ID")).show()
> 
> I get this error:
> 
> java.lang.IllegalStateException: unread block data
> 
> Thanks,
> Ben
> 
> 
>> On Feb 19, 2016, at 11:12 AM, Josh Mahonin <jmaho...@gmail.com 
>> <mailto:jmaho...@gmail.com>> wrote:
>> 
>> What specifically doesn't work for you?
>> 
>> I have a Docker image that I used to do some basic testing on it with and 
>> haven't run into any problems:
>> https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark 
>> <https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark>
>> 
>> On Fri, Feb 19, 2016 at 12:40 PM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> All,
>> 
>> Thanks for the help. I have switched out Cloudera’s HBase 1.0.0 with the 
>> current Apache HBase 1.1.3. Also, I installed Phoenix 4.7.0, and everything 
>> works fine except for the Phoenix Spark Plugin. I wonder if it’s a version 
>> incompatibility issue with Spark 1.6. Has anyone tried compiling 4.7.0 using 
>> Spark 1.6?
>> 
>> Thanks,
>> Ben
>> 
>>> On Feb 12, 2016, at 6:33 AM, Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> 
>>> Anyone know when Phoenix 4.7 will be officially released? And what Cloudera 
>>> distribution versions will it be compatible with?
>>> 
>>> Thanks,
>>> Ben
>>> 
>>>> On Feb 10, 2016, at 11:03 AM, Benjamin Kim <bbuil...@gmail.com 
>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>> 
>>>> Hi Pierre,
>>>> 
>>>> I am getting this error now.
>>>> 
>>>> Error: org.apache.phoenix.exception.PhoenixIOException: 
>>>> org.apache.hadoop.hbase.DoNotRetryIOException: 
>>>> SYSTEM.CATALOG,,1453397732623.8af7b44f3d7609eb301ad98641ff2611.: 
>>>> org.apache.hadoop.hbase.client.Delete.setAttribute(Ljava/lang/String;[B)Lorg/apache/hadoop/hbase/client/Delete;
>>>> 
>>>> I even tried to use sqlline.py to do some queries too. It resulted in the 
>>>> same error. I followed the installation instructions. Is there something 
>>>> missing?
>>>> 
>>>> Thanks,
>>>> Ben
>>>> 
>>>> 
>>>>> On Feb 9, 2016, at 10:20 AM, Ravi Kiran <maghamraviki...@gmail.com 
>>>>> <mailto:maghamraviki...@gmail.com>> wrote:
>>>>> 
>>>>> Hi Pierre,
>>>>> 
>>>>>   Try your luck for building the artifacts from 
>>>>> https://github.com/chiastic-security/phoenix-for-cloudera 
>>>>> <https://github.com/chiastic-security/phoenix-for-cloudera>. Hopefully it 
>>>>> helps.
>>>>> 
>>>>> Regards
>>>>> Ravi .
>>>>> 
>>>>> On Tue, Feb 9, 2016 at 10:04 AM, Benjamin Kim <bbuil...@gmail.com 
>>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>>> Hi Pierre,
>>>>> 
>>>>> I found this article about how Cloudera’s version of HBase is very 
>>>>> different than Apache HBase so it must be compiled using Cloudera’s repo 
>>>>> and versions. But, I’m not having any success with it.
>>>>> 
>>>>> http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo
>>>>>  
>>>>> <http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo>
>>>>> 
>>>>> There’s also a Chinese site that does the same thing.
>>>>> 
>>>>> https://www.zybuluo.com/xtccc/note/205739 
>>>>> <https://www.zybuluo.com/xtccc/note/205739>
>>>>> 
>>>>> I keep getting errors like the one’s below.
>>>>> 
>>>>> [ERROR] 
>>>>> /opt/tools/phoenix/phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/LocalIndexMerger.java:[110,29]
>>>>>  cannot find symbol
>>>>> [ERROR] symbol:   class Region
>>>>> [ERROR] location: class 
>>>>> org.apache.hadoop.hbase.regionserver.LocalIndexMerger
>>>>> …
>>>>> 
>>>>> Have you tried this also?
>>>>> 
>>>>> As a last resort, we will have to abandon Cloudera’s HBase for Apache’s 
>>>>> HBase.
>>>>> 
>>>>> Thanks,
>>>>> Ben
>>>>> 
>>>>> 
>>>>>> On Feb 8, 2016, at 11:04 PM, pierre lacave <pie...@lacave.me 
>>>>>> <mailto:pie...@lacave.me>> wrote:
>>>>>> 
>>>>>> Havent met that one.
>>>>>> 
>>>>>> According to SPARK-1867, the real issue is hidden.
>>>>>> 
>>>>>> I d process by elimination, maybe try in local[*] mode first
>>>>>> 
>>>>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867 
>>>>>> <https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867>
>>>>>> On Tue, 9 Feb 2016, 04:58 Benjamin Kim <bbuil...@gmail.com 
>>>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>>>> Pierre,
>>>>>> 
>>>>>> I got it to work using phoenix-4.7.0-HBase-1.0-client-spark.jar. But, 
>>>>>> now, I get this error:
>>>>>> 
>>>>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
>>>>>> 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in 
>>>>>> stage 0.0 (TID 3, prod-dc1-datanode151.pdc1i.gradientx.com 
>>>>>> <http://prod-dc1-datanode151.pdc1i.gradientx.com/>): 
>>>>>> java.lang.IllegalStateException: unread block data
>>>>>> 
>>>>>> It happens when I do:
>>>>>> 
>>>>>> df.show()
>>>>>> 
>>>>>> Getting closer…
>>>>>> 
>>>>>> Thanks,
>>>>>> Ben
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Feb 8, 2016, at 2:57 PM, pierre lacave <pie...@lacave.me 
>>>>>>> <mailto:pie...@lacave.me>> wrote:
>>>>>>> 
>>>>>>> This is the wrong client jar try with the one named 
>>>>>>> phoenix-4.7.0-HBase-1.1-client-spark.jar 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, 8 Feb 2016, 22:29 Benjamin Kim <bbuil...@gmail.com 
>>>>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>>>>> Hi Josh,
>>>>>>> 
>>>>>>> I tried again by putting the settings within the spark-default.conf.
>>>>>>> 
>>>>>>> spark.driver.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
>>>>>>> spark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
>>>>>>> 
>>>>>>> I still get the same error using the code below.
>>>>>>> 
>>>>>>> import org.apache.phoenix.spark._
>>>>>>> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
>>>>>>> "TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181"))
>>>>>>> 
>>>>>>> Can you tell me what else you’re doing?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Ben
>>>>>>> 
>>>>>>> 
>>>>>>>> On Feb 8, 2016, at 1:44 PM, Josh Mahonin <jmaho...@gmail.com 
>>>>>>>> <mailto:jmaho...@gmail.com>> wrote:
>>>>>>>> 
>>>>>>>> Hi Ben,
>>>>>>>> 
>>>>>>>> I'm not sure about the format of those command line options you're 
>>>>>>>> passing. I've had success with spark-shell just by setting the 
>>>>>>>> 'spark.executor.extraClassPath' and 'spark.driver.extraClassPath' 
>>>>>>>> options on the spark config, as per the docs [1].
>>>>>>>> 
>>>>>>>> I'm not sure if there's anything special needed for CDH or not though. 
>>>>>>>> I also have a docker image I've been toying with which has a working 
>>>>>>>> Spark/Phoenix setup using the Phoenix 4.7.0 RC and Spark 1.6.0. It 
>>>>>>>> might be a useful reference for you as well [2].
>>>>>>>> 
>>>>>>>> Good luck,
>>>>>>>> 
>>>>>>>> Josh
>>>>>>>> 
>>>>>>>> [1] https://phoenix.apache.org/phoenix_spark.html 
>>>>>>>> <https://phoenix.apache.org/phoenix_spark.html>
>>>>>>>> [2] https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark 
>>>>>>>> <https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark>
>>>>>>>> 
>>>>>>>> On Mon, Feb 8, 2016 at 4:29 PM, Benjamin Kim <bbuil...@gmail.com 
>>>>>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>>>>>> Hi Pierre,
>>>>>>>> 
>>>>>>>> I tried to run in spark-shell using spark 1.6.0 by running this:
>>>>>>>> 
>>>>>>>> spark-shell --master yarn-client --driver-class-path 
>>>>>>>> /opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar 
>>>>>>>> --driver-java-options 
>>>>>>>> "-Dspark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar”
>>>>>>>> 
>>>>>>>> The version of HBase is the one in CDH5.4.8, which is 1.0.0-cdh5.4.8.
>>>>>>>> 
>>>>>>>> When I get to the line:
>>>>>>>> 
>>>>>>>> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
>>>>>>>> “TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181”))
>>>>>>>> 
>>>>>>>> I get this error:
>>>>>>>> 
>>>>>>>> java.lang.NoClassDefFoundError: Could not initialize class 
>>>>>>>> org.apache.spark.rdd.RDDOperationScope$
>>>>>>>> 
>>>>>>>> Any ideas?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Ben
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Feb 5, 2016, at 1:36 PM, pierre lacave <pie...@lacave.me 
>>>>>>>>> <mailto:pie...@lacave.me>> wrote:
>>>>>>>>> 
>>>>>>>>> I don't know when the full release will be, RC1 just got pulled out, 
>>>>>>>>> and expecting RC2 soon
>>>>>>>>> 
>>>>>>>>> you can find them here 
>>>>>>>>> 
>>>>>>>>> https://dist.apache.org/repos/dist/dev/phoenix/ 
>>>>>>>>> <https://dist.apache.org/repos/dist/dev/phoenix/>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> there is a new phoenix-4.7.0-HBase-1.1-client-spark.jar that is all 
>>>>>>>>> you need to have in spark classpath
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Pierre Lacave
>>>>>>>>> 171 Skellig House, Custom House, Lower Mayor street, Dublin 1, Ireland
>>>>>>>>> Phone :       +353879128708 <tel:%2B353879128708>
>>>>>>>>> 
>>>>>>>>> On Fri, Feb 5, 2016 at 9:28 PM, Benjamin Kim <bbuil...@gmail.com 
>>>>>>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>>>>>>> Hi Pierre,
>>>>>>>>> 
>>>>>>>>> When will I be able to download this version?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Ben
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Friday, February 5, 2016, pierre lacave <pie...@lacave.me 
>>>>>>>>> <mailto:pie...@lacave.me>> wrote:
>>>>>>>>> This was addressed in Phoenix 4.7 (currently in RC) 
>>>>>>>>> https://issues.apache.org/jira/browse/PHOENIX-2503 
>>>>>>>>> <https://issues.apache.org/jira/browse/PHOENIX-2503>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Pierre Lacave
>>>>>>>>> 171 Skellig House, Custom House, Lower Mayor street, Dublin 1, Ireland
>>>>>>>>> Phone :       +353879128708 <tel:%2B353879128708>
>>>>>>>>> 
>>>>>>>>> On Fri, Feb 5, 2016 at 6:17 PM, Benjamin Kim <bbuil...@gmail.com <>> 
>>>>>>>>> wrote:
>>>>>>>>> I cannot get this plugin to work in CDH 5.4.8 using Phoenix 4.5.2 and 
>>>>>>>>> Spark 1.6. When I try to launch spark-shell, I get:
>>>>>>>>> 
>>>>>>>>>         java.lang.RuntimeException: java.lang.RuntimeException: 
>>>>>>>>> Unable to instantiate 
>>>>>>>>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>>>>>>>> 
>>>>>>>>> I continue on and run the example code. When I get tot the line below:
>>>>>>>>> 
>>>>>>>>>         val df = sqlContext.load("org.apache.phoenix.spark", 
>>>>>>>>> Map("table" -> "TEST.MY_TEST", "zkUrl" -> 
>>>>>>>>> "zookeeper1,zookeeper2,zookeeper3:2181")
>>>>>>>>> 
>>>>>>>>> I get this error:
>>>>>>>>> 
>>>>>>>>>         java.lang.NoSuchMethodError: 
>>>>>>>>> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
>>>>>>>>> 
>>>>>>>>> Can someone help?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Ben
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
> 
>

Re: Spark Phoenix Plugin

Reply via email to