date:20200327

[GitHub] [carbondata] dependabot[bot] commented on issue #3447: Bump dep.jackson.version from 2.6.5 to 2.10.1 in /store/sdk

2020-03-27 Thread GitBox

dependabot[bot] commented on issue #3447: Bump dep.jackson.version from 2.6.5 
to 2.10.1 in /store/sdk
URL: https://github.com/apache/carbondata/pull/3447#issuecomment-605373307
 
 
   Dependabot tried to update this pull request, but something went wrong. 
We're looking into it, but in the meantime you can retry the update by 
commenting `@dependabot rebase`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] dependabot[bot] commented on issue #3456: Bump solr.version from 6.3.0 to 8.3.0 in /datamap/lucene

2020-03-27 Thread GitBox

dependabot[bot] commented on issue #3456: Bump solr.version from 6.3.0 to 8.3.0 
in /datamap/lucene
URL: https://github.com/apache/carbondata/pull/3456#issuecomment-605373310
 
 
   Dependabot tried to update this pull request, but something went wrong. 
We're looking into it, but in the meantime you can retry the update by 
commenting `@dependabot rebase`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] asfgit closed pull request #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI

2020-03-27 Thread GitBox

asfgit closed pull request #3685: [HOTFIX] Fix all flink test case failure and 
enable UT in CI
URL: https://github.com/apache/carbondata/pull/3685
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] niuge01 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI

2020-03-27 Thread GitBox

niuge01 commented on issue #3685: [HOTFIX] Fix all flink test case failure and 
enable UT in CI
URL: https://github.com/apache/carbondata/pull/3685#issuecomment-605371290
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case 
failure and enable UT in CI
URL: https://github.com/apache/carbondata/pull/3685#issuecomment-605280758
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2583/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case 
failure and enable UT in CI
URL: https://github.com/apache/carbondata/pull/3685#issuecomment-605267135
 
 
   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/875/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user 
guid to doc
URL: https://github.com/apache/carbondata/pull/3520#issuecomment-605217739
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2582/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user 
guid to doc
URL: https://github.com/apache/carbondata/pull/3520#issuecomment-605214533
 
 
   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/874/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue 
with respect to segmentMetadaInfo after update and clean files
URL: https://github.com/apache/carbondata/pull/3683#issuecomment-605205781
 
 
   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/871/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue 
with respect to segmentMetadaInfo after update and clean files
URL: https://github.com/apache/carbondata/pull/3683#issuecomment-605197577
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2579/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue 
with respect to segmentMetadaInfo after update and clean files
URL: https://github.com/apache/carbondata/pull/3683#issuecomment-605131953
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2575/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue 
with respect to segmentMetadaInfo after update and clean files
URL: https://github.com/apache/carbondata/pull/3683#issuecomment-605130361
 
 
   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/867/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc

2020-03-27 Thread GitBox

VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r399400512
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,109 @@
+
+
+# What is spatial index
+
+[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data 
structure that allows for accessing a spatial object efficiently. It is a 
common technique used by spatial databases.  Without indexing, any search for a 
feature would require a "sequential scan" of every record in the database, 
resulting in much longer processing time. In a spatial index construction 
process, the minimum bounding rectangle serves as an object approximation. 
Various types of spatial indices across commercial and open-source databases 
yield measurable performance differences. Spatial indexing techniques are 
playing a central role in time-critical applications and the manipulation of 
spatial big data.
+
+
+
+# How does carbondata implement spatial index
+
+There are many opensource implementations for spatial indexing and to process 
spatial queries. CarbonData implements a different way of spatial index. Its 
core idea is to use the raster data. Raster is made up of matrix of cells 
organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve).
+
+CarbonData rasterize the user data during data load into segments. A set of 
latitude and longitude represents a grid range. The size of the grid can be 
configured. Hence, the coordinates loaded are often discrete and not continuous.
+
+Below figure shows the relationship between the grid and the points residing 
in it. Black point represents the center point of the grid, and the red points 
are the coordinates at the arbitrary positions inside the grid. The red points 
can be replaced by the center point of the grid to indicate that the points 
lies within the grid. During data load, CarbonData generates an Index for 
coordinate according to row and column of the grid(in the raster) where that 
coordinate lies. These Indexes are the same as Z order. For the detailed 
conversion algorithm, please refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata 
use the polygon and spatial region information passed in when creating a table 
build a quad tree.  The nodes in the quad tree are composed of hash ids 
generated by the row and column information projected in the polygon area. When 
the query polygon area is not disjoint from the grid center point, the grid is 
considered selected.  In the following figure, user select a quadrilateral 
polygon,  The grid with the center point in the region will generate a 
quadtree. A list of line with continuous properties will be generated in the 
query process, like [97->97  99->99  102->102  104->111  120->120  122->123  
151->151  157->158  159->159  192->208  210->210  216->216  225->225  
228->229], each part of the list represents a continuous grid area. Carbondata 
use that line list to prune and filtered. About the detail can be search under 
https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+# Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the 
mode has been open. 
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 
'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional 
information. carbon will create a Invisible hash id column.
+
+example
+
+```sql
+create table source_index(id BIGINT, latitude long, longitude long) stored by 
'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash',  
+'INDEX_HANDLER.mygeohash.type'='geohash',   
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',   
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.gridSize'='50',   
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',   
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',   
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',   
+'INDEX_HANDLER.mygeohash.conversionRatio'='100');
 
 Review comment:
   It is already  described in the below table properties description. I think, 
it is not required here again here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to Gi

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc

2020-03-27 Thread GitBox

VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r399400743
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,109 @@
+
+
+# What is spatial index
+
+[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data 
structure that allows for accessing a spatial object efficiently. It is a 
common technique used by spatial databases.  Without indexing, any search for a 
feature would require a "sequential scan" of every record in the database, 
resulting in much longer processing time. In a spatial index construction 
process, the minimum bounding rectangle serves as an object approximation. 
Various types of spatial indices across commercial and open-source databases 
yield measurable performance differences. Spatial indexing techniques are 
playing a central role in time-critical applications and the manipulation of 
spatial big data.
+
+
+
+# How does carbondata implement spatial index
+
+There are many opensource implementations for spatial indexing and to process 
spatial queries. CarbonData implements a different way of spatial index. Its 
core idea is to use the raster data. Raster is made up of matrix of cells 
organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve).
+
+CarbonData rasterize the user data during data load into segments. A set of 
latitude and longitude represents a grid range. The size of the grid can be 
configured. Hence, the coordinates loaded are often discrete and not continuous.
+
+Below figure shows the relationship between the grid and the points residing 
in it. Black point represents the center point of the grid, and the red points 
are the coordinates at the arbitrary positions inside the grid. The red points 
can be replaced by the center point of the grid to indicate that the points 
lies within the grid. During data load, CarbonData generates an Index for 
coordinate according to row and column of the grid(in the raster) where that 
coordinate lies. These Indexes are the same as Z order. For the detailed 
conversion algorithm, please refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata 
use the polygon and spatial region information passed in when creating a table 
build a quad tree.  The nodes in the quad tree are composed of hash ids 
generated by the row and column information projected in the polygon area. When 
the query polygon area is not disjoint from the grid center point, the grid is 
considered selected.  In the following figure, user select a quadrilateral 
polygon,  The grid with the center point in the region will generate a 
quadtree. A list of line with continuous properties will be generated in the 
query process, like [97->97  99->99  102->102  104->111  120->120  122->123  
151->151  157->158  159->159  192->208  210->210  216->216  225->225  
228->229], each part of the list represents a continuous grid area. Carbondata 
use that line list to prune and filtered. About the detail can be search under 
https://issues.apache.org/jira/browse/CARBONDATA-3548
 
 Review comment:
   Added


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc

2020-03-27 Thread GitBox

VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r399399530
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,109 @@
+
+
+# What is spatial index
+
+[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data 
structure that allows for accessing a spatial object efficiently. It is a 
common technique used by spatial databases.  Without indexing, any search for a 
feature would require a "sequential scan" of every record in the database, 
resulting in much longer processing time. In a spatial index construction 
process, the minimum bounding rectangle serves as an object approximation. 
Various types of spatial indices across commercial and open-source databases 
yield measurable performance differences. Spatial indexing techniques are 
playing a central role in time-critical applications and the manipulation of 
spatial big data.
+
+
+
+# How does carbondata implement spatial index
+
+There are many opensource implementations for spatial indexing and to process 
spatial queries. CarbonData implements a different way of spatial index. Its 
core idea is to use the raster data. Raster is made up of matrix of cells 
organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve).
+
+CarbonData rasterize the user data during data load into segments. A set of 
latitude and longitude represents a grid range. The size of the grid can be 
configured. Hence, the coordinates loaded are often discrete and not continuous.
+
+Below figure shows the relationship between the grid and the points residing 
in it. Black point represents the center point of the grid, and the red points 
are the coordinates at the arbitrary positions inside the grid. The red points 
can be replaced by the center point of the grid to indicate that the points 
lies within the grid. During data load, CarbonData generates an Index for 
coordinate according to row and column of the grid(in the raster) where that 
coordinate lies. These Indexes are the same as Z order. For the detailed 
conversion algorithm, please refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata 
use the polygon and spatial region information passed in when creating a table 
build a quad tree.  The nodes in the quad tree are composed of hash ids 
generated by the row and column information projected in the polygon area. When 
the query polygon area is not disjoint from the grid center point, the grid is 
considered selected.  In the following figure, user select a quadrilateral 
polygon,  The grid with the center point in the region will generate a 
quadtree. A list of line with continuous properties will be generated in the 
query process, like [97->97  99->99  102->102  104->111  120->120  122->123  
151->151  157->158  159->159  192->208  210->210  216->216  225->225  
228->229], each part of the list represents a continuous grid area. Carbondata 
use that line list to prune and filtered. About the detail can be search under 
https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+# Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the 
mode has been open. 
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 
'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional 
information. carbon will create a Invisible hash id column.
+
+example
+
+```sql
+create table source_index(id BIGINT, latitude long, longitude long) stored by 
'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash',  
+'INDEX_HANDLER.mygeohash.type'='geohash',   
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',   
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.gridSize'='50',   
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',   
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',   
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',   
+'INDEX_HANDLER.mygeohash.conversionRatio'='100');
+```
+
+| Name| Value | Describe   
  |
 
 Review comment:
   Modified


This is an automated message from the Apache Git Service.
To respond

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc

2020-03-27 Thread GitBox

VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r399399033
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,109 @@
+
+
+# What is spatial index
+
+[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data 
structure that allows for accessing a spatial object efficiently. It is a 
common technique used by spatial databases.  Without indexing, any search for a 
feature would require a "sequential scan" of every record in the database, 
resulting in much longer processing time. In a spatial index construction 
process, the minimum bounding rectangle serves as an object approximation. 
Various types of spatial indices across commercial and open-source databases 
yield measurable performance differences. Spatial indexing techniques are 
playing a central role in time-critical applications and the manipulation of 
spatial big data.
+
+
+
+# How does carbondata implement spatial index
+
+There are many opensource implementations for spatial indexing and to process 
spatial queries. CarbonData implements a different way of spatial index. Its 
core idea is to use the raster data. Raster is made up of matrix of cells 
organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve).
+
+CarbonData rasterize the user data during data load into segments. A set of 
latitude and longitude represents a grid range. The size of the grid can be 
configured. Hence, the coordinates loaded are often discrete and not continuous.
+
+Below figure shows the relationship between the grid and the points residing 
in it. Black point represents the center point of the grid, and the red points 
are the coordinates at the arbitrary positions inside the grid. The red points 
can be replaced by the center point of the grid to indicate that the points 
lies within the grid. During data load, CarbonData generates an Index for 
coordinate according to row and column of the grid(in the raster) where that 
coordinate lies. These Indexes are the same as Z order. For the detailed 
conversion algorithm, please refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata 
use the polygon and spatial region information passed in when creating a table 
build a quad tree.  The nodes in the quad tree are composed of hash ids 
generated by the row and column information projected in the polygon area. When 
the query polygon area is not disjoint from the grid center point, the grid is 
considered selected.  In the following figure, user select a quadrilateral 
polygon,  The grid with the center point in the region will generate a 
quadtree. A list of line with continuous properties will be generated in the 
query process, like [97->97  99->99  102->102  104->111  120->120  122->123  
151->151  157->158  159->159  192->208  210->210  216->216  225->225  
228->229], each part of the list represents a continuous grid area. Carbondata 
use that line list to prune and filtered. About the detail can be search under 
https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+# Installation and Deployment
 
 Review comment:
   Modified


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc

2020-03-27 Thread GitBox

VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r399398817
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,109 @@
+
+
+# What is spatial index
+
+[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data 
structure that allows for accessing a spatial object efficiently. It is a 
common technique used by spatial databases.  Without indexing, any search for a 
feature would require a "sequential scan" of every record in the database, 
resulting in much longer processing time. In a spatial index construction 
process, the minimum bounding rectangle serves as an object approximation. 
Various types of spatial indices across commercial and open-source databases 
yield measurable performance differences. Spatial indexing techniques are 
playing a central role in time-critical applications and the manipulation of 
spatial big data.
+
+
+
+# How does carbondata implement spatial index
+
+There are many opensource implementations for spatial indexing and to process 
spatial queries. CarbonData implements a different way of spatial index. Its 
core idea is to use the raster data. Raster is made up of matrix of cells 
organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve).
+
+CarbonData rasterize the user data during data load into segments. A set of 
latitude and longitude represents a grid range. The size of the grid can be 
configured. Hence, the coordinates loaded are often discrete and not continuous.
+
+Below figure shows the relationship between the grid and the points residing 
in it. Black point represents the center point of the grid, and the red points 
are the coordinates at the arbitrary positions inside the grid. The red points 
can be replaced by the center point of the grid to indicate that the points 
lies within the grid. During data load, CarbonData generates an Index for 
coordinate according to row and column of the grid(in the raster) where that 
coordinate lies. These Indexes are the same as Z order. For the detailed 
conversion algorithm, please refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata 
use the polygon and spatial region information passed in when creating a table 
build a quad tree.  The nodes in the quad tree are composed of hash ids 
generated by the row and column information projected in the polygon area. When 
the query polygon area is not disjoint from the grid center point, the grid is 
considered selected.  In the following figure, user select a quadrilateral 
polygon,  The grid with the center point in the region will generate a 
quadtree. A list of line with continuous properties will be generated in the 
query process, like [97->97  99->99  102->102  104->111  120->120  122->123  
151->151  157->158  159->159  192->208  210->210  216->216  225->225  
228->229], each part of the list represents a continuous grid area. Carbondata 
use that line list to prune and filtered. About the detail can be search under 
https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+# Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the 
mode has been open. 
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
 
 Review comment:
   Modified


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files

2020-03-27 Thread GitBox

akashrn5 commented on a change in pull request #3683: [CARBONDATA-3755]Fix 
clean up issue with respect to segmentMetadaInfo after update and clean files
URL: https://github.com/apache/carbondata/pull/3683#discussion_r399380210
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
 ##
 @@ -579,10 +580,28 @@ public static void cleanUpDeltaFiles(CarbonTable table, 
boolean forceDelete) thr
 }
 String UUID = String.valueOf(System.currentTimeMillis());
 List segmentFilesToBeUpdatedLatest = new ArrayList<>();
+CarbonFile segmentFilesLocation =
+
FileFactory.getCarbonFile(CarbonTablePath.getSegmentFilesLocation(table.getTablePath()));
 for (Segment segment : segmentFilesToBeUpdated) {
-  String file =
-  SegmentFileStore.writeSegmentFile(table, segment.getSegmentNo(), 
UUID);
-  segmentFilesToBeUpdatedLatest.add(new Segment(segment.getSegmentNo(), 
file));
+  SegmentFileStore fileStore =
+  new SegmentFileStore(table.getTablePath(), 
segment.getSegmentFileName());
+  
segment.setSegmentMetaDataInfo(fileStore.getSegmentFile().getSegmentMetaDataInfo());
 
 Review comment:
   added


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ydvpankaj99 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files

2020-03-27 Thread GitBox

ydvpankaj99 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with 
respect to segmentMetadaInfo after update and clean files
URL: https://github.com/apache/carbondata/pull/3683#issuecomment-605046083
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case 
failure and enable UT in CI
URL: https://github.com/apache/carbondata/pull/3685#issuecomment-605032353
 
 
   Build Failed  with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/865/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case 
failure and enable UT in CI
URL: https://github.com/apache/carbondata/pull/3685#issuecomment-605030448
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2573/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] jackylk commented on issue #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file

2020-03-27 Thread GitBox

jackylk commented on issue #3684: [CARBONDATA-3756] Fix stage query bug it only 
read the first blocklet of each carbondata file
URL: https://github.com/apache/carbondata/pull/3684#issuecomment-605002288
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files

2020-03-27 Thread GitBox

Indhumathi27 commented on a change in pull request #3683: [CARBONDATA-3755]Fix 
clean up issue with respect to segmentMetadaInfo after update and clean files
URL: https://github.com/apache/carbondata/pull/3683#discussion_r399261249
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
 ##
 @@ -579,10 +580,28 @@ public static void cleanUpDeltaFiles(CarbonTable table, 
boolean forceDelete) thr
 }
 String UUID = String.valueOf(System.currentTimeMillis());
 List segmentFilesToBeUpdatedLatest = new ArrayList<>();
+CarbonFile segmentFilesLocation =
+
FileFactory.getCarbonFile(CarbonTablePath.getSegmentFilesLocation(table.getTablePath()));
 for (Segment segment : segmentFilesToBeUpdated) {
-  String file =
-  SegmentFileStore.writeSegmentFile(table, segment.getSegmentNo(), 
UUID);
-  segmentFilesToBeUpdatedLatest.add(new Segment(segment.getSegmentNo(), 
file));
+  SegmentFileStore fileStore =
+  new SegmentFileStore(table.getTablePath(), 
segment.getSegmentFileName());
+  
segment.setSegmentMetaDataInfo(fileStore.getSegmentFile().getSegmentMetaDataInfo());
 
 Review comment:
   Can you please add a testcase for this issue, if possible


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float 
stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604995090
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2572/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float 
stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604992269
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/864/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI

2020-03-27 Thread GitBox

ajantha-bhat commented on issue #3685: [HOTFIX] Fix all flink test case failure 
and enable UT in CI
URL: https://github.com/apache/carbondata/pull/3685#issuecomment-604967884
 
 
   @niuge01 , @jackylk : please check


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI

2020-03-27 Thread GitBox

ajantha-bhat opened a new pull request #3685: [HOTFIX] Fix all flink test case 
failure and enable UT in CI
URL: https://github.com/apache/carbondata/pull/3685
 
 
### Why is this PR needed?
After #3628, default BATCH_FILE_ORDER is wrong [it is not ASC or DSC].
   so, all the test case in flink module failed due as no order is set in stage 
command.
   Also flink UT is not running in CI, hence it is not caught

### What changes were proposed in this PR?
   Fix the default value to Ascending order.
   Enable UT running for flink module.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3684: [CARBONDATA-3756] Fix stage query bug 
it only read the first blocklet of each carbondata file
URL: https://github.com/apache/carbondata/pull/3684#issuecomment-604943790
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2571/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3684: [CARBONDATA-3756] Fix stage query bug 
it only read the first blocklet of each carbondata file
URL: https://github.com/apache/carbondata/pull/3684#issuecomment-604941008
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/863/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] brijoobopanna commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

2020-03-27 Thread GitBox

brijoobopanna commented on issue #3682: [CARBONDATA-3753] optimize double/float 
stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604935581
 
 
   retest this please
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user 
guid to doc
URL: https://github.com/apache/carbondata/pull/3520#issuecomment-604894818
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2570/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc

2020-03-27 Thread GitBox

CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user 
guid to doc
URL: https://github.com/apache/carbondata/pull/3520#issuecomment-604892466
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/862/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] QiangCai opened a new pull request #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file

2020-03-27 Thread GitBox

QiangCai opened a new pull request #3684: [CARBONDATA-3756] Fix stage query bug 
it only read the first blocklet of each carbondata file
URL: https://github.com/apache/carbondata/pull/3684
 
 
   
### Why is this PR needed?
   The query of stage files only read the first blocklet of each carbondata 
file.
   So when the file contains multiple blocklets, the query result will be wrong.

### What changes were proposed in this PR?
   The query of stage files should read the all blocklets of all carbondata 
files.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (CARBONDATA-3756) the query of stage files only read the first blocklet of each carbondata file

2020-03-27 Thread David Cai (Jira)

David Cai created CARBONDATA-3756:
-

 Summary: the query of stage files only read the first blocklet of 
each carbondata file
 Key: CARBONDATA-3756
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3756
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai


the query of stage files only read the first blocklet of each carbondata file.

if the file contains multiple blocklets, the query result will be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CARBONDATA-3755) segment metadata info is not copied to new segment files after update and clean files

2020-03-27 Thread Akash R Nilugal (Jira)

Akash R Nilugal created CARBONDATA-3755:
---

 Summary: segment metadata info is not copied to new segment files 
after update and clean files
 Key: CARBONDATA-3755
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3755
 Project: CarbonData
  Issue Type: Bug
Reporter: Akash R Nilugal
Assignee: Akash R Nilugal


segment metadata info is not copied to new segment files after update and clean 
files

old segment files are also not getting deleted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CARBONDATA-3754) old data and index files not getting cleared after SIrebuild

2020-03-27 Thread Akash R Nilugal (Jira)

Akash R Nilugal created CARBONDATA-3754:
---

 Summary: old data and index files not getting cleared after 
SIrebuild
 Key: CARBONDATA-3754
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3754
 Project: CarbonData
  Issue Type: Bug
Reporter: Akash R Nilugal
Assignee: Akash R Nilugal


old data and index files not getting cleared after SIrebuild



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc

2020-03-27 Thread GitBox

ajantha-bhat commented on a change in pull request #3520: [CARBONDATA-3548]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r399068116
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,109 @@
+
+
+# What is spatial index
+
+[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data 
structure that allows for accessing a spatial object efficiently. It is a 
common technique used by spatial databases.  Without indexing, any search for a 
feature would require a "sequential scan" of every record in the database, 
resulting in much longer processing time. In a spatial index construction 
process, the minimum bounding rectangle serves as an object approximation. 
Various types of spatial indices across commercial and open-source databases 
yield measurable performance differences. Spatial indexing techniques are 
playing a central role in time-critical applications and the manipulation of 
spatial big data.
+
+
+
+# How does carbondata implement spatial index
+
+There are many opensource implementations for spatial indexing and to process 
spatial queries. CarbonData implements a different way of spatial index. Its 
core idea is to use the raster data. Raster is made up of matrix of cells 
organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve).
+
+CarbonData rasterize the user data during data load into segments. A set of 
latitude and longitude represents a grid range. The size of the grid can be 
configured. Hence, the coordinates loaded are often discrete and not continuous.
+
+Below figure shows the relationship between the grid and the points residing 
in it. Black point represents the center point of the grid, and the red points 
are the coordinates at the arbitrary positions inside the grid. The red points 
can be replaced by the center point of the grid to indicate that the points 
lies within the grid. During data load, CarbonData generates an Index for 
coordinate according to row and column of the grid(in the raster) where that 
coordinate lies. These Indexes are the same as Z order. For the detailed 
conversion algorithm, please refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata 
use the polygon and spatial region information passed in when creating a table 
build a quad tree.  The nodes in the quad tree are composed of hash ids 
generated by the row and column information projected in the polygon area. When 
the query polygon area is not disjoint from the grid center point, the grid is 
considered selected.  In the following figure, user select a quadrilateral 
polygon,  The grid with the center point in the region will generate a 
quadtree. A list of line with continuous properties will be generated in the 
query process, like [97->97  99->99  102->102  104->111  120->120  122->123  
151->151  157->158  159->159  192->208  210->210  216->216  225->225  
228->229], each part of the list represents a continuous grid area. Carbondata 
use that line list to prune and filtered. About the detail can be search under 
https://issues.apache.org/jira/browse/CARBONDATA-3548
 
 Review comment:
   Mention below lines:
   The main reason for faster spatial queries in carbon is because of polygon 
filter will be pushed down to the carbon layer and carbon will scan only the 
matched blocklets instead of full scan.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat edited a comment on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc

2020-03-27 Thread GitBox

ajantha-bhat edited a comment on issue #3520: [CARBONDATA-3548]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#issuecomment-604842289
 
 
   @MarvinLitt : please handle comment. I want to merge this PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

38 matches

Mail list logo