[GitHub] incubator-carbondata pull request #611: [CARBONDATA-731] Enhance and correct...

Hexiaoqiao Sun, 05 Mar 2017 02:51:24 -0800

Github user Hexiaoqiao commented on a diff in the pull request:

    
https://github.com/apache/incubator-carbondata/pull/611#discussion_r104309761
  
    --- Diff: docs/installation-guide.md ---
    @@ -92,77 +96,87 @@ To get started with CarbonData : [Quick 
Start](quick-start-guide.md), [DDL Opera
     
        The following steps are only for Driver Nodes. (Driver nodes are the 
one which starts the spark context.)
     
    -* [Build the 
CarbonData](https://cwiki.apache.org/confluence/display/CARBONDATA/Building+CarbonData+And+IDE+Configuration)
 project and get the assembly jar from 
"./assembly/target/scala-2.10/carbondata_xxx.jar" and put in the 
``"<SPARK_HOME>/carbonlib"`` folder.
    +1. [Build the 
CarbonData](https://github.com/apache/incubator-carbondata/blob/master/build/README.md)
 project and get the assembly jar from 
`./assembly/target/scala-2.1x/carbondata_xxx.jar` and copy to 
`<SPARK_HOME>/carbonlib` folder.
     
    -      NOTE: Create the carbonlib folder if it does not exists inside 
``"<SPARK_HOME>"`` path.
    +    **NOTE**: Create the carbonlib folder if it does not exists inside 
`<SPARK_HOME>` path.
     
    -* Copy "carbonplugins" folder to ``"<SPARK_HOME>/carbonlib"`` folder from 
"./processing/" folder of CarbonData repository.
    -      carbonplugins will contain .kettle folder.
    +2. Copy the `./processing/carbonplugins` folder from CarbonData repository 
to `<SPARK_HOME>/carbonlib/` folder.
     
    -* Copy the "carbon.properties.template" to 
``"<SPARK_HOME>/conf/carbon.properties"`` folder from conf folder of CarbonData 
repository.
    -* Modify the parameters in "spark-default.conf" located in the 
``"<SPARK_HOME>/conf``"
    +    **NOTE**: carbonplugins will contain .kettle folder.
     
    -| Property | Description | Value |
    
-|---------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|
    -| spark.master | Set this value to run the Spark in yarn cluster mode. | 
Set "yarn-client" to run the Spark in yarn cluster mode. |
    -| spark.yarn.dist.files | Comma-separated list of files to be placed in 
the working directory of each executor. 
|``"<YOUR_SPARK_HOME_PATH>"/conf/carbon.properties`` |
    -| spark.yarn.dist.archives | Comma-separated list of archives to be 
extracted into the working directory of each executor. 
|``"<YOUR_SPARK_HOME_PATH>"/carbonlib/carbondata_xxx.jar`` |
    -| spark.executor.extraJavaOptions | A string of extra JVM options to pass 
to executors. For instance  NOTE: You can enter multiple values separated by 
space. 
|``-Dcarbon.properties.filepath="<YOUR_SPARK_HOME_PATH>"/conf/carbon.properties``
 |
    -| spark.executor.extraClassPath | Extra classpath entries to prepend to 
the classpath of executors. NOTE: If SPARK_CLASSPATH is defined in 
spark-env.sh, then comment it and append the values in below parameter 
spark.driver.extraClassPath 
|``"<YOUR_SPARK_HOME_PATH>"/carbonlib/carbonlib/carbondata_xxx.jar`` |
    -| spark.driver.extraClassPath | Extra classpath entries to prepend to the 
classpath of the driver. NOTE: If SPARK_CLASSPATH is defined in spark-env.sh, 
then comment it and append the value in below parameter 
spark.driver.extraClassPath. 
|``"<YOUR_SPARK_HOME_PATH>"/carbonlib/carbonlib/carbondata_xxx.jar`` |
    -| spark.driver.extraJavaOptions | A string of extra JVM options to pass to 
the driver. For instance, GC settings or other logging. 
|``-Dcarbon.properties.filepath="<YOUR_SPARK_HOME_PATH>"/conf/carbon.properties``
 |
    -| carbon.kettle.home | Path that will be used by CarbonData internally to 
create graph for loading the data. 
|``"<YOUR_SPARK_HOME_PATH>"/carbonlib/carbonplugins`` |
    +3. Copy the `./conf/carbon.properties.template` file from CarbonData 
repository to `<SPARK_HOME>/conf/` folder and rename the file to 
`carbon.properties`.
     
    -* Add the following properties in ``<SPARK_HOME>/conf/ carbon.properties``:
    +4. Create `tar,gz` file of carbonlib folder and move it inside the 
carbonlib folder.
     
    -| Property | Required | Description | Example | Default Value |
    
-|----------------------|----------|----------------------------------------------------------------------------------------|-------------------------------------|---------------|
    -| carbon.storelocation | NO | Location where CarbonData will create the 
store and write the data in its own format. | 
hdfs://HOSTNAME:PORT/Opt/CarbonStore | Propose to set HDFS directory|
    -| carbon.kettle.home | YES | Path that will be used by CarbonData 
internally to create graph for loading the data. | 
$SPARK_HOME/carbonlib/carbonplugins |  |
    +```
    +   cd <SPARK_HOME>
    +   tar -zcvf carbondata.tar.gz carbonlib/
    +   mv carbondata.tar.gz carbonlib/
    +```
     
    +5. Configure the properties mentioned in the following table in 
`<SPARK_HOME>/conf/spark-defaults.conf` file.
     
    -* Verify the installation.
    +   | Property | Description | Value |
    +   
|---------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|
    +   | spark.master | Set this value to run the Spark in yarn cluster mode. 
| Set yarn-client to run the Spark in yarn cluster mode. |
    +   | spark.yarn.dist.files | Comma-separated list of files to be placed in 
the working directory of each executor. |"<SPARK_HOME>"/conf/carbon.properties |
    +   | spark.yarn.dist.archives | Comma-separated list of archives to be 
extracted into the working directory of each executor. 
|"<SPARK_HOME>"/carbonlib/carbondata.tar.gz |
    +   | spark.executor.extraJavaOptions | A string of extra JVM options to 
pass to executors. For instance  **NOTE**: You can enter multiple values 
separated by space. |-Dcarbon.properties.filepath=carbon.properties |
    +   | spark.executor.extraClassPath | Extra classpath entries to prepend to 
the classpath of executors. **NOTE**: If SPARK_CLASSPATH is defined in 
spark-env.sh, then comment it and append the values in below parameter 
spark.driver.extraClassPath |carbondata.tar.gz/carbonlib/* |
    +   | spark.driver.extraClassPath | Extra classpath entries to prepend to 
the classpath of the driver. **NOTE**: If SPARK_CLASSPATH is defined in 
spark-env.sh, then comment it and append the value in below parameter 
spark.driver.extraClassPath. |"<SPARK_HOME>"/carbonlib/carbonlib/* |
    +   | spark.driver.extraJavaOptions | A string of extra JVM options to pass 
to the driver. For instance, GC settings or other logging. 
|-Dcarbon.properties.filepath="<SPARK_HOME>"/conf/carbon.properties |
     
    -```
    +
    +6. Add the following properties in `<SPARK_HOME>/conf/carbon.properties`:
    +
    +   | Property | Required | Description | Example | Default Value |
    +   
|----------------------|----------|----------------------------------------------------------------------------------------|-------------------------------------|---------------|
    +   | carbon.storelocation | NO | Location where CarbonData will create the 
store and write the data in its own format. | 
hdfs://HOSTNAME:PORT/Opt/CarbonStore | Propose to set HDFS directory|
    +   | carbon.kettle.home | YES | Path that will be used by CarbonData 
internally to create graph for loading the data. | 
carbondata.tar.gz/carbonlib/carbonplugins |  |
    +
    +
    +7. Verify the installation.
    +
    +   ```
          ./bin/spark-shell --master yarn-client --driver-memory 1g
          --executor-cores 2 --executor-memory 2G
    -```
    -  NOTE: Make sure you have permissions for CarbonData JARs and files 
through which driver and executor will start.
    +   ```
    +  **NOTE**: Make sure you have permissions for CarbonData JARs and files 
through which driver and executor will start.
     
       Getting started with CarbonData : [Quick Start](quick-start-guide.md), 
[DDL Operations on CarbonData](ddl-operation-on-carbondata.md)
     
     ## Query Execution Using CarbonData Thrift Server
     
    -### Starting CarbonData Thrift Server
    +### Starting CarbonData Thrift Server.
     
    -   a. cd ``<SPARK_HOME>``
    +   a. cd `<SPARK_HOME>`
     
        b. Run the following command to start the CarbonData thrift server.
     
    -```
    -./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true
    ---class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
    -$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
    -```
    +   ```
    +   ./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true
    +   --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
    +   <SPARK_HOME>/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
    +   ```
     
    -| Parameter | Description | Example |
    
-|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
    -| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the 
``"<SPARK_HOME>"/carbonlib/`` folder. | 
carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar |
    -| carbon_store_path | This is a parameter to the CarbonThriftServer class. 
This a HDFS path where CarbonData files will be kept. Strongly Recommended to 
put same as carbon.storelocation parameter of carbon.properties. | 
``hdfs//<host_name>:54310/user/hive/warehouse/carbon.store`` |
    +   | Parameter | Description | Example |
    --- End diff --
    
    The syntax of this table may be not correct, please click `view` button 
which is locate at top-right corner for check & review.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #611: [CARBONDATA-731] Enhance and correct...

Reply via email to