Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/337#issuecomment-71429315
  
    It depends on what the user does with the `HADOOP_CLASSPATH`.
    In my understanding, it is meant as a variable for adding 3rd party jar 
files to Hadoop. The jar files of hadoop are added to the `CLASSPATH` variable 
in the `libexec/hadoop-config.sh` script. There, you see variables like 
`HADOOP_COMMON_LIB_JARS_DIR`, `HDFS_LIB_JARS_DIR`, `YARN_LIB_JARS_DIR`, ... 
being added to the CLASSPATH. In the very last step, they add the 
HADOOP_CLASSPATH variable (by default to the end of the classpath, but there is 
an additional option to put it in front of it).
    
    I found that we need to add this on Google Compute Engine's Hadoop 
deployment. They have their Google Storage configured by default but it 
currently doesn't work in non-yarn setups because the Google Storage jar is not 
in our classpath. On these clusters, the `HADOOP_CLASSPATH` variable contains 
the path to the storage-jar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to