[
https://issues.apache.org/jira/browse/PHOENIX-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349745#comment-16349745
]
jifei_yang commented on PHOENIX-4490:
-------------------------------------
Hi,[~karanmehta93],In the production environment, I solve the problem this way:
1, download the corresponding version of the Apache phoenix source
2, modify the package org.apache.phoenix.spark.ConfigurationUtil this class
3, in this package, add krb5.conf, you.keytab, hadoop xml file,
log4j.properties and hbase-site.xml
4, the phoenix dependent dependencies added to the cluster
/etc/spark/conf/classpath.txt, ensure that each submitted spark task, you can
get the phoenix dependency package.
> Phoenix Spark Module doesn't pass in user properties to create connection
> -------------------------------------------------------------------------
>
> Key: PHOENIX-4490
> URL: https://issues.apache.org/jira/browse/PHOENIX-4490
> Project: Phoenix
> Issue Type: Bug
> Reporter: Karan Mehta
> Priority: Major
>
> Phoenix Spark module doesn't work perfectly in a Kerberos environment. This
> is because whenever new {{PhoenixRDD}} are built, they are always built with
> new and default properties. The following piece of code in
> {{PhoenixRelation}} is an example. This is the class used by spark to create
> {{BaseRelation}} before executing a scan.
> {code}
> new PhoenixRDD(
> sqlContext.sparkContext,
> tableName,
> requiredColumns,
> Some(buildFilter(filters)),
> Some(zkUrl),
> new Configuration(),
> dateAsTimestamp
> ).toDataFrame(sqlContext).rdd
> {code}
> This would work fine in most cases if the spark code is being run on the same
> cluster as HBase, the config object will pickup properties from Class path
> xml files. However in an external environment we should use the user provided
> properties and merge them before creating any {{PhoenixRelation}} or
> {{PhoenixRDD}}. As per my understanding, we should ideally provide properties
> in {{DefaultSource#createRelation() method}}.
> An example of when this fails is, Spark tries to get the splits to optimize
> the MR performance for loading data in the table in
> {{PhoenixInputFormat#generateSplits()}} methods. Ideally, it should get all
> the config parameters from the {{JobContext}} being passed, but it is
> defaulted to {{new Configuration()}}, irrespective of what user passes in.
> Thus it fails to create a connection.
> [~jmahonin] [[email protected]]
> Any ideas or advice? Let me know if I am missing anything obvious here.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)