[ https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291582#comment-14291582 ]
ASF GitHub Bot commented on FLINK-1433: --------------------------------------- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/337#issuecomment-71429315 It depends on what the user does with the `HADOOP_CLASSPATH`. In my understanding, it is meant as a variable for adding 3rd party jar files to Hadoop. The jar files of hadoop are added to the `CLASSPATH` variable in the `libexec/hadoop-config.sh` script. There, you see variables like `HADOOP_COMMON_LIB_JARS_DIR`, `HDFS_LIB_JARS_DIR`, `YARN_LIB_JARS_DIR`, ... being added to the CLASSPATH. In the very last step, they add the HADOOP_CLASSPATH variable (by default to the end of the classpath, but there is an additional option to put it in front of it). I found that we need to add this on Google Compute Engine's Hadoop deployment. They have their Google Storage configured by default but it currently doesn't work in non-yarn setups because the Google Storage jar is not in our classpath. On these clusters, the `HADOOP_CLASSPATH` variable contains the path to the storage-jar. > Add HADOOP_CLASSPATH to start scripts > ------------------------------------- > > Key: FLINK-1433 > URL: https://issues.apache.org/jira/browse/FLINK-1433 > Project: Flink > Issue Type: Improvement > Reporter: Robert Metzger > Fix For: 0.8.1 > > > With the Hadoop file system wrapper, its important to have access to the > hadoop filesystem classes. > The HADOOP_CLASSPATH seems to be a standard environment variable used by > Hadoop for such libraries. > Deployments like Google Compute Cloud set this variable containing the > "Google Cloud Storage Hadoop Wrapper". So if users want to use the Cloud > Storage in an non-yarn environment, we need to address this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)