[jira] [Commented] (PHOENIX-4490) Phoenix Spark Module doesn't pass in user properties to create connection

jifei_yang (JIRA) Thu, 01 Feb 2018 19:52:00 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349745#comment-16349745
 ]


jifei_yang commented on PHOENIX-4490:
-------------------------------------

Hi,[~karanmehta93],In the production environment, I solve the problem this way:
1, download the corresponding version of the Apache phoenix source
2, modify the package org.apache.phoenix.spark.ConfigurationUtil this class
3, in this package, add krb5.conf, you.keytab, hadoop xml file, 
log4j.properties and hbase-site.xml
4, the phoenix dependent dependencies added to the cluster 
/etc/spark/conf/classpath.txt, ensure that each submitted spark task, you can 
get the phoenix dependency package.

> Phoenix Spark Module doesn't pass in user properties to create connection
> -------------------------------------------------------------------------
>
>                 Key: PHOENIX-4490
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4490
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Karan Mehta
>            Priority: Major
>
> Phoenix Spark module doesn't work perfectly in a Kerberos environment. This 
> is because whenever new {{PhoenixRDD}} are built, they are always built with 
> new and default properties. The following piece of code in 
> {{PhoenixRelation}} is an example. This is the class used by spark to create 
> {{BaseRelation}} before executing a scan. 
> {code}
>     new PhoenixRDD(
>       sqlContext.sparkContext,
>       tableName,
>       requiredColumns,
>       Some(buildFilter(filters)),
>       Some(zkUrl),
>       new Configuration(),
>       dateAsTimestamp
>     ).toDataFrame(sqlContext).rdd
> {code}
> This would work fine in most cases if the spark code is being run on the same 
> cluster as HBase, the config object will pickup properties from Class path 
> xml files. However in an external environment we should use the user provided 
> properties and merge them before creating any {{PhoenixRelation}} or 
> {{PhoenixRDD}}. As per my understanding, we should ideally provide properties 
> in {{DefaultSource#createRelation() method}}.
> An example of when this fails is, Spark tries to get the splits to optimize 
> the MR performance for loading data in the table in 
> {{PhoenixInputFormat#generateSplits()}} methods. Ideally, it should get all 
> the config parameters from the {{JobContext}} being passed, but it is 
> defaulted to {{new Configuration()}}, irrespective of what user passes in. 
> Thus it fails to create a connection.
> [~jmahonin] [[email protected]] 
> Any ideas or advice? Let me know if I am missing anything obvious here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-4490) Phoenix Spark Module doesn't pass in user properties to create connection

Reply via email to