[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2576


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207250192
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,64 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+S3 is an Object Storage API on cloud, it is recommended for storing large 
data files. You can use 
+this feature if you want to store data on Amazon cloud or Huawei 
cloud(OBS).
+Since the data is stored on to cloud there are no restrictions on the size 
of data and the data can be accessed from anywhere at any time.
+Carbondata can support any Object Storage that conforms to Amazon S3 API.
--- End diff --

merged


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207250096
  
--- Diff: docs/configuration-parameters.md ---
@@ -106,7 +106,10 @@ This section provides the details of all the 
configurations required for CarbonD
 
|-|--|-|
 | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used 
during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed 
buffer size is 10485760 byte. |
 | carbon.lock.type | LOCALLOCK | This configuration specifies the type of 
lock to be acquired during concurrent operations on table. There are following 
types of lock implementation: - LOCALLOCK: Lock is created on local file system 
as file. This lock is useful when only one spark driver (thrift server) runs on 
a machine and no other CarbonData spark application is launched concurrently. - 
HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when 
multiple CarbonData spark applications are launched and no ZooKeeper is running 
on cluster and HDFS supports file based locking. |
-| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to be created. Recommended to configure zookeeper lock 
type or configure HDFS lock path(to this property) in case of S3 file system as 
locking is not feasible on S3.
+| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to 
+be created. Recommended to configure HDFS lock path(to this property) in 
case of S3 file system 
+as locking is not feasible on S3. 
+**Note:** If this property is not set to HDFS location for S3 store, then 
there is a possibility of data corruption. 
--- End diff --

done


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207250066
  
--- Diff: docs/configuration-parameters.md ---
@@ -106,7 +106,10 @@ This section provides the details of all the 
configurations required for CarbonD
 
|-|--|-|
 | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used 
during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed 
buffer size is 10485760 byte. |
 | carbon.lock.type | LOCALLOCK | This configuration specifies the type of 
lock to be acquired during concurrent operations on table. There are following 
types of lock implementation: - LOCALLOCK: Lock is created on local file system 
as file. This lock is useful when only one spark driver (thrift server) runs on 
a machine and no other CarbonData spark application is launched concurrently. - 
HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when 
multiple CarbonData spark applications are launched and no ZooKeeper is running 
on cluster and HDFS supports file based locking. |
-| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to be created. Recommended to configure zookeeper lock 
type or configure HDFS lock path(to this property) in case of S3 file system as 
locking is not feasible on S3.
+| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to 
--- End diff --

added description


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207249973
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -730,6 +736,8 @@ Users can specify which columns to include and exclude 
for local dictionary gene
   * If the IGNORE option is used, then bad records are neither loaded nor 
written to the separate CSV file.
   * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION 
is invalid and the load operation fails.
   * The maximum number of characters per column is 32000. If there are 
more than 32000 characters in a column, data loading will fail.
+  * Since Bad Records Path can be specified in both create, load and 
carbon properties. 
--- End diff --

done


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207249849
  
--- Diff: docs/datamap/preaggregate-datamap-guide.md ---
@@ -7,6 +24,7 @@
 * [Querying Data](#querying-data)
 * [Compaction](#compacting-pre-aggregate-tables)
 * [Data Management](#data-management-with-pre-aggregate-tables)
+* [Limitations](#Limitations)
--- End diff --

removed


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207249941
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,63 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+Amazon S3 is a cloud storage service that is recommended for storing large 
data files. You can 
+use this feature if you want to store data on amazon cloud. Since the data 
is stored on to cloud 
+storage there are no restrictions on the size of data and the data can be 
accessed from anywhere at any time.
+Carbon can support any Object store that conforms to Amazon S3 API. 
+
+#Writing to Object Store
+To store carbondata files on to Object Store location, you need to set 
`carbon
+.storelocation` property to Object Store path in CarbonProperties file. 
For example, carbon
+.storelocation=s3a://mybucket/carbonstore. By setting this property, all 
the tables will be created on the specified Object Store path.
+
+If your existing store is HDFS, and you want to store specific tables on 
S3 location, then `location` parameter has to be set during create 
+table. 
+For example:
+
+```
+CREATE TABLE IF NOT EXISTS db1.table1(col1 string, col2 int) STORED AS 
carbondata LOCATION 's3a://mybucket/carbonstore'
+``` 
+
+For more details on create table, Refer 
[data-management-on-carbondata](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table)
+
+#Authentication
+You need to set authentication properties to store the carbondata files on 
to S3 location. For 
+more details on authentication properties, refer 
+[hadoop authentication 
document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties)
+
+Another way of setting the authentication parameters is as follows:
+
+```
+ SparkSession
+ .builder()
+ .master(masterURL)
+ .appName("S3Example")
+ .config("spark.driver.host", "localhost")
+ .config("spark.hadoop.fs.s3a.access.key", "")
+ .config("spark.hadoop.fs.s3a.secret.key", "")
+ .config("spark.hadoop.fs.s3a.endpoint", "1.1.1.1")
+ .getOrCreateCarbonSession()
+```
+
+#Recommendations
+1. Object stores like S3 does not support file leasing mechanism(supported 
by HDFS) that is 
+required to take locks which ensure consistency between concurrent 
operations therefore, it is 
+recommended to set the configurable lock path 
property([carbon.lock.path](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md#miscellaneous-configuration))
+ to a HDFS directory.
+2. As Object stores are eventual consistent meaning that any put request 
can take some time to reflect when trying to list objects from that bucket 
therefore concurrent queries are not supported. 
--- End diff --

done


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207249910
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,63 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+Amazon S3 is a cloud storage service that is recommended for storing large 
data files. You can 
--- End diff --

done


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207249485
  
--- Diff: docs/configuration-parameters.md ---
@@ -106,7 +106,12 @@ This section provides the details of all the 
configurations required for CarbonD
 
|-|--|-|
 | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used 
during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed 
buffer size is 10485760 byte. |
 | carbon.lock.type | LOCALLOCK | This configuration specifies the type of 
lock to be acquired during concurrent operations on table. There are following 
types of lock implementation: - LOCALLOCK: Lock is created on local file system 
as file. This lock is useful when only one spark driver (thrift server) runs on 
a machine and no other CarbonData spark application is launched concurrently. - 
HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when 
multiple CarbonData spark applications are launched and no ZooKeeper is running 
on cluster and HDFS supports file based locking. |
-| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to be created. Recommended to configure zookeeper lock 
type or configure HDFS lock path(to this property) in case of S3 file system as 
locking is not feasible on S3.
+| carbon.lock.path | TABLEPATH | Locks on the files are used to prevent 
concurrent operation from modifying the same files. This 
+configuration specifies the path where lock files have to be created. 
Recommended to configure 
+HDFS lock path(to this property) in case of S3 file system as locking is 
not feasible on S3. 
+**Note:** If this property is not set to HDFS location for S3 store, then 
there is a possibility 
+of data corruption because multiple data manipulation calls might try to 
update the status file 
+and as lock is not acquired before updation data might get overwritten.
--- End diff --

added


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread sgururajshetty
Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207223686
  
--- Diff: docs/configuration-parameters.md ---
@@ -106,7 +106,12 @@ This section provides the details of all the 
configurations required for CarbonD
 
|-|--|-|
 | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used 
during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed 
buffer size is 10485760 byte. |
 | carbon.lock.type | LOCALLOCK | This configuration specifies the type of 
lock to be acquired during concurrent operations on table. There are following 
types of lock implementation: - LOCALLOCK: Lock is created on local file system 
as file. This lock is useful when only one spark driver (thrift server) runs on 
a machine and no other CarbonData spark application is launched concurrently. - 
HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when 
multiple CarbonData spark applications are launched and no ZooKeeper is running 
on cluster and HDFS supports file based locking. |
-| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to be created. Recommended to configure zookeeper lock 
type or configure HDFS lock path(to this property) in case of S3 file system as 
locking is not feasible on S3.
+| carbon.lock.path | TABLEPATH | Locks on the files are used to prevent 
concurrent operation from modifying the same files. This 
+configuration specifies the path where lock files have to be created. 
Recommended to configure 
+HDFS lock path(to this property) in case of S3 file system as locking is 
not feasible on S3. 
+**Note:** If this property is not set to HDFS location for S3 store, then 
there is a possibility 
+of data corruption because multiple data manipulation calls might try to 
update the status file 
+and as lock is not acquired before updation data might get overwritten.
--- End diff --

since it is table, end the line with a pipline |



---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207072493
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,64 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+S3 is an Object Storage API on cloud, it is recommended for storing large 
data files. You can use 
+this feature if you want to store data on Amazon cloud or Huawei 
cloud(OBS).
+Since the data is stored on to cloud there are no restrictions on the size 
of data and the data can be accessed from anywhere at any time.
+Carbondata can support any Object Storage that conforms to Amazon S3 API.
--- End diff --

This sentence can be merged with the above sentence "You can use this 
feature if you want to store data "


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207071826
  
--- Diff: docs/configuration-parameters.md ---
@@ -106,7 +106,10 @@ This section provides the details of all the 
configurations required for CarbonD
 
|-|--|-|
 | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used 
during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed 
buffer size is 10485760 byte. |
 | carbon.lock.type | LOCALLOCK | This configuration specifies the type of 
lock to be acquired during concurrent operations on table. There are following 
types of lock implementation: - LOCALLOCK: Lock is created on local file system 
as file. This lock is useful when only one spark driver (thrift server) runs on 
a machine and no other CarbonData spark application is launched concurrently. - 
HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when 
multiple CarbonData spark applications are launched and no ZooKeeper is running 
on cluster and HDFS supports file based locking. |
-| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to be created. Recommended to configure zookeeper lock 
type or configure HDFS lock path(to this property) in case of S3 file system as 
locking is not feasible on S3.
+| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to 
--- End diff --

add a brief description as to why locks are used in carbondata.what is 
TABLEPATH ?


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207073807
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,64 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+S3 is an Object Storage API on cloud, it is recommended for storing large 
data files. You can use 
+this feature if you want to store data on Amazon cloud or Huawei 
cloud(OBS).
+Since the data is stored on to cloud there are no restrictions on the size 
of data and the data can be accessed from anywhere at any time.
+Carbondata can support any Object Storage that conforms to Amazon S3 API.
+
+#Writing to Object Storage
+To store carbondata files on to Object Store location, you need to set 
`carbon
+.storelocation` property to Object Store path in CarbonProperties file. 
For example, carbon
+.storelocation=s3a://mybucket/carbonstore. By setting this property, all 
the tables will be created on the specified Object Store path.
+
+If your existing store is HDFS, and you want to store specific tables on 
S3 location, then `location` parameter has to be set during create 
--- End diff --

If you don't wish to change the existing store location and would wish to 
store only specific tables onto S3,it can be done by setting the 'location' 
option parameter in the create table ddl command


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207072154
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -730,6 +736,8 @@ Users can specify which columns to include and exclude 
for local dictionary gene
   * If the IGNORE option is used, then bad records are neither loaded nor 
written to the separate CSV file.
   * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION 
is invalid and the load operation fails.
   * The maximum number of characters per column is 32000. If there are 
more than 32000 characters in a column, data loading will fail.
+  * Since Bad Records Path can be specified in both create, load and 
carbon properties. 
--- End diff --

entire sentence to be reformed. not a grammatically correct statement


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207071907
  
--- Diff: docs/configuration-parameters.md ---
@@ -106,7 +106,10 @@ This section provides the details of all the 
configurations required for CarbonD
 
|-|--|-|
 | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used 
during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed 
buffer size is 10485760 byte. |
 | carbon.lock.type | LOCALLOCK | This configuration specifies the type of 
lock to be acquired during concurrent operations on table. There are following 
types of lock implementation: - LOCALLOCK: Lock is created on local file system 
as file. This lock is useful when only one spark driver (thrift server) runs on 
a machine and no other CarbonData spark application is launched concurrently. - 
HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when 
multiple CarbonData spark applications are launched and no ZooKeeper is running 
on cluster and HDFS supports file based locking. |
-| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to be created. Recommended to configure zookeeper lock 
type or configure HDFS lock path(to this property) in case of S3 file system as 
locking is not feasible on S3.
+| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to 
+be created. Recommended to configure HDFS lock path(to this property) in 
case of S3 file system 
+as locking is not feasible on S3. 
+**Note:** If this property is not set to HDFS location for S3 store, then 
there is a possibility of data corruption. 
--- End diff --

can add a brief sentence as to why corruption might happen


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207074600
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,64 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+S3 is an Object Storage API on cloud, it is recommended for storing large 
data files. You can use 
+this feature if you want to store data on Amazon cloud or Huawei 
cloud(OBS).
+Since the data is stored on to cloud there are no restrictions on the size 
of data and the data can be accessed from anywhere at any time.
+Carbondata can support any Object Storage that conforms to Amazon S3 API.
+
+#Writing to Object Storage
+To store carbondata files on to Object Store location, you need to set 
`carbon
+.storelocation` property to Object Store path in CarbonProperties file. 
For example, carbon
+.storelocation=s3a://mybucket/carbonstore. By setting this property, all 
the tables will be created on the specified Object Store path.
+
+If your existing store is HDFS, and you want to store specific tables on 
S3 location, then `location` parameter has to be set during create 
+table. 
+For example:
+
+```
+CREATE TABLE IF NOT EXISTS db1.table1(col1 string, col2 int) STORED AS 
carbondata LOCATION 's3a://mybucket/carbonstore'
+``` 
+
+For more details on create table, Refer 
[data-management-on-carbondata](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table)
+
+#Authentication
+You need to set authentication properties to store the carbondata files on 
to S3 location. For 
+more details on authentication properties, refer 
+[hadoop authentication 
document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties)
+
+Another way of setting the authentication parameters is as follows:
+
+```
+ SparkSession
+ .builder()
+ .master(masterURL)
+ .appName("S3Example")
+ .config("spark.driver.host", "localhost")
+ .config("spark.hadoop.fs.s3a.access.key", "")
+ .config("spark.hadoop.fs.s3a.secret.key", "")
+ .config("spark.hadoop.fs.s3a.endpoint", "1.1.1.1")
+ .getOrCreateCarbonSession()
+```
+
+#Recommendations
+1. Object Storage like S3 does not support file leasing 
mechanism(supported by HDFS) that is 
+required to take locks which ensure consistency between concurrent 
operations therefore, it is 
+recommended to set the configurable lock path 
property([carbon.lock.path](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md#miscellaneous-configuration))
+ to a HDFS directory.
+2. As Object Storage are eventual consistent meaning that any put request 
can take some time to 
--- End diff --

Concurrent data manipulation operations are not supported. object stores 
follow eventual consistency semantics,ie.,any put request might take some time 
to reflect when trying to list.This behaviour causes not to ensure the data 
read is always consistent or latest.


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-07-31 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r206515601
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -730,6 +736,8 @@ Users can specify which columns to include and exclude 
for local dictionary gene
   * If the IGNORE option is used, then bad records are neither loaded nor 
written to the separate CSV file.
   * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION 
is invalid and the load operation fails.
   * The maximum number of characters per column is 32000. If there are 
more than 32000 characters in a column, data loading will fail.
+  * Since Bad Records Path can be specified in both create, load and 
carbon properties. 
--- End diff --

"both" does not suite in this statement. Please rewrite.


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-07-31 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r206481821
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,63 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+Amazon S3 is a cloud storage service that is recommended for storing large 
data files. You can 
+use this feature if you want to store data on amazon cloud. Since the data 
is stored on to cloud 
+storage there are no restrictions on the size of data and the data can be 
accessed from anywhere at any time.
+Carbon can support any Object store that conforms to Amazon S3 API. 
+
+#Writing to Object Store
+To store carbondata files on to Object Store location, you need to set 
`carbon
+.storelocation` property to Object Store path in CarbonProperties file. 
For example, carbon
+.storelocation=s3a://mybucket/carbonstore. By setting this property, all 
the tables will be created on the specified Object Store path.
+
+If your existing store is HDFS, and you want to store specific tables on 
S3 location, then `location` parameter has to be set during create 
+table. 
+For example:
+
+```
+CREATE TABLE IF NOT EXISTS db1.table1(col1 string, col2 int) STORED AS 
carbondata LOCATION 's3a://mybucket/carbonstore'
+``` 
+
+For more details on create table, Refer 
[data-management-on-carbondata](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table)
+
+#Authentication
+You need to set authentication properties to store the carbondata files on 
to S3 location. For 
+more details on authentication properties, refer 
+[hadoop authentication 
document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties)
+
+Another way of setting the authentication parameters is as follows:
+
+```
+ SparkSession
+ .builder()
+ .master(masterURL)
+ .appName("S3Example")
+ .config("spark.driver.host", "localhost")
+ .config("spark.hadoop.fs.s3a.access.key", "")
+ .config("spark.hadoop.fs.s3a.secret.key", "")
+ .config("spark.hadoop.fs.s3a.endpoint", "1.1.1.1")
+ .getOrCreateCarbonSession()
+```
+
+#Recommendations
+1. Object stores like S3 does not support file leasing mechanism(supported 
by HDFS) that is 
+required to take locks which ensure consistency between concurrent 
operations therefore, it is 
+recommended to set the configurable lock path 
property([carbon.lock.path](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md#miscellaneous-configuration))
+ to a HDFS directory.
+2. As Object stores are eventual consistent meaning that any put request 
can take some time to reflect when trying to list objects from that bucket 
therefore concurrent queries are not supported. 
--- End diff --

Changes to : Object Storage


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-07-31 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r206481369
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,63 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+Amazon S3 is a cloud storage service that is recommended for storing large 
data files. You can 
--- End diff --

Suggest changing to : 

S3 is an object storage API on cloud,it is recommended for storing large 
data files. You can use this feature if you want to store data on amazon cloud 
or huawei cloud(obs). Since the data is stored on cloud 
storage there are no restrictions on the size of data and the data can be 
accessed from anywhere at any time.
Carbondata can support any Object storage that conforms to Amazon S3 API.


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-07-31 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r206480055
  
--- Diff: docs/datamap/preaggregate-datamap-guide.md ---
@@ -7,6 +24,7 @@
 * [Querying Data](#querying-data)
 * [Compaction](#compacting-pre-aggregate-tables)
 * [Data Management](#data-management-with-pre-aggregate-tables)
+* [Limitations](#Limitations)
--- End diff --

Why need to add this item


---