[GitHub] [carbondata] dependabot[bot] commented on issue #3447: Bump dep.jackson.version from 2.6.5 to 2.10.1 in /store/sdk
dependabot[bot] commented on issue #3447: Bump dep.jackson.version from 2.6.5 to 2.10.1 in /store/sdk URL: https://github.com/apache/carbondata/pull/3447#issuecomment-605373307 Dependabot tried to update this pull request, but something went wrong. We're looking into it, but in the meantime you can retry the update by commenting `@dependabot rebase`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] dependabot[bot] commented on issue #3456: Bump solr.version from 6.3.0 to 8.3.0 in /datamap/lucene
dependabot[bot] commented on issue #3456: Bump solr.version from 6.3.0 to 8.3.0 in /datamap/lucene URL: https://github.com/apache/carbondata/pull/3456#issuecomment-605373310 Dependabot tried to update this pull request, but something went wrong. We're looking into it, but in the meantime you can retry the update by commenting `@dependabot rebase`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] asfgit closed pull request #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI
asfgit closed pull request #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI URL: https://github.com/apache/carbondata/pull/3685 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] niuge01 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI
niuge01 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI URL: https://github.com/apache/carbondata/pull/3685#issuecomment-605371290 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI
CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI URL: https://github.com/apache/carbondata/pull/3685#issuecomment-605280758 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2583/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI
CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI URL: https://github.com/apache/carbondata/pull/3685#issuecomment-605267135 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/875/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc
CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc URL: https://github.com/apache/carbondata/pull/3520#issuecomment-605217739 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2582/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc
CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc URL: https://github.com/apache/carbondata/pull/3520#issuecomment-605214533 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/874/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files
CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files URL: https://github.com/apache/carbondata/pull/3683#issuecomment-605205781 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/871/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files
CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files URL: https://github.com/apache/carbondata/pull/3683#issuecomment-605197577 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2579/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files
CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files URL: https://github.com/apache/carbondata/pull/3683#issuecomment-605131953 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2575/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files
CarbonDataQA1 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files URL: https://github.com/apache/carbondata/pull/3683#issuecomment-605130361 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/867/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc
VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc URL: https://github.com/apache/carbondata/pull/3520#discussion_r399400512 ## File path: docs/spatial-index-guide.md ## @@ -0,0 +1,109 @@ + + +# What is spatial index + +[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases. Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data. + + + +# How does carbondata implement spatial index + +There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for that coodrinate is generated using longitude and latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve). + +CarbonData rasterize the user data during data load into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. Hence, the coordinates loaded are often discrete and not continuous. + +Below figure shows the relationship between the grid and the points residing in it. Black point represents the center point of the grid, and the red points are the coordinates at the arbitrary positions inside the grid. The red points can be replaced by the center point of the grid to indicate that the points lies within the grid. During data load, CarbonData generates an Index for coordinate according to row and column of the grid(in the raster) where that coordinate lies. These Indexes are the same as Z order. For the detailed conversion algorithm, please refer to the design documents of spatial index. + +![File Directory Structure](../docs/images/spatial-index-1.png?raw=true) + +When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area. When the query polygon area is not disjoint from the grid center point, the grid is considered selected. In the following figure, user select a quadrilateral polygon, The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97 99->99 102->102 104->111 120->120 122->123 151->151 157->158 159->159 192->208 210->210 216->216 225->225 228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548 + +![File Directory Structure](../docs/images/spatial-index-2.png?raw=true) + + + +# Installation and Deployment + +Build source with modules geo open, can open "pom.xml" and check whether the mode has been open. + +![File Directory Structure](../docs/images/spatial-index-3.png?raw=true) + +Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 'jst-core.jar' to your carbonlib path. + +## Basic Command + +### Create Table + +spatial index need to appoint the source column and other regional information. carbon will create a Invisible hash id column. + +example + +```sql +create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES ( +'INDEX_HANDLER'='mygeohash', +'INDEX_HANDLER.mygeohash.type'='geohash', +'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude', +'INDEX_HANDLER.mygeohash.originLatitude'='19.832277', +'INDEX_HANDLER.mygeohash.gridSize'='50', +'INDEX_HANDLER.mygeohash.minLongitude'='1.811865', +'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233', +'INDEX_HANDLER.mygeohash.minLatitude'='19.832277', +'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281', +'INDEX_HANDLER.mygeohash.conversionRatio'='100'); Review comment: It is already described in the below table properties description. I think, it is not required here again here. This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc
VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc URL: https://github.com/apache/carbondata/pull/3520#discussion_r399400743 ## File path: docs/spatial-index-guide.md ## @@ -0,0 +1,109 @@ + + +# What is spatial index + +[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases. Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data. + + + +# How does carbondata implement spatial index + +There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for that coodrinate is generated using longitude and latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve). + +CarbonData rasterize the user data during data load into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. Hence, the coordinates loaded are often discrete and not continuous. + +Below figure shows the relationship between the grid and the points residing in it. Black point represents the center point of the grid, and the red points are the coordinates at the arbitrary positions inside the grid. The red points can be replaced by the center point of the grid to indicate that the points lies within the grid. During data load, CarbonData generates an Index for coordinate according to row and column of the grid(in the raster) where that coordinate lies. These Indexes are the same as Z order. For the detailed conversion algorithm, please refer to the design documents of spatial index. + +![File Directory Structure](../docs/images/spatial-index-1.png?raw=true) + +When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area. When the query polygon area is not disjoint from the grid center point, the grid is considered selected. In the following figure, user select a quadrilateral polygon, The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97 99->99 102->102 104->111 120->120 122->123 151->151 157->158 159->159 192->208 210->210 216->216 225->225 228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548 Review comment: Added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc
VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc URL: https://github.com/apache/carbondata/pull/3520#discussion_r399399530 ## File path: docs/spatial-index-guide.md ## @@ -0,0 +1,109 @@ + + +# What is spatial index + +[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases. Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data. + + + +# How does carbondata implement spatial index + +There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for that coodrinate is generated using longitude and latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve). + +CarbonData rasterize the user data during data load into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. Hence, the coordinates loaded are often discrete and not continuous. + +Below figure shows the relationship between the grid and the points residing in it. Black point represents the center point of the grid, and the red points are the coordinates at the arbitrary positions inside the grid. The red points can be replaced by the center point of the grid to indicate that the points lies within the grid. During data load, CarbonData generates an Index for coordinate according to row and column of the grid(in the raster) where that coordinate lies. These Indexes are the same as Z order. For the detailed conversion algorithm, please refer to the design documents of spatial index. + +![File Directory Structure](../docs/images/spatial-index-1.png?raw=true) + +When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area. When the query polygon area is not disjoint from the grid center point, the grid is considered selected. In the following figure, user select a quadrilateral polygon, The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97 99->99 102->102 104->111 120->120 122->123 151->151 157->158 159->159 192->208 210->210 216->216 225->225 228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548 + +![File Directory Structure](../docs/images/spatial-index-2.png?raw=true) + + + +# Installation and Deployment + +Build source with modules geo open, can open "pom.xml" and check whether the mode has been open. + +![File Directory Structure](../docs/images/spatial-index-3.png?raw=true) + +Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 'jst-core.jar' to your carbonlib path. + +## Basic Command + +### Create Table + +spatial index need to appoint the source column and other regional information. carbon will create a Invisible hash id column. + +example + +```sql +create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES ( +'INDEX_HANDLER'='mygeohash', +'INDEX_HANDLER.mygeohash.type'='geohash', +'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude', +'INDEX_HANDLER.mygeohash.originLatitude'='19.832277', +'INDEX_HANDLER.mygeohash.gridSize'='50', +'INDEX_HANDLER.mygeohash.minLongitude'='1.811865', +'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233', +'INDEX_HANDLER.mygeohash.minLatitude'='19.832277', +'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281', +'INDEX_HANDLER.mygeohash.conversionRatio'='100'); +``` + +| Name| Value | Describe | Review comment: Modified This is an automated message from the Apache Git Service. To respond
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc
VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc URL: https://github.com/apache/carbondata/pull/3520#discussion_r399399033 ## File path: docs/spatial-index-guide.md ## @@ -0,0 +1,109 @@ + + +# What is spatial index + +[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases. Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data. + + + +# How does carbondata implement spatial index + +There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for that coodrinate is generated using longitude and latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve). + +CarbonData rasterize the user data during data load into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. Hence, the coordinates loaded are often discrete and not continuous. + +Below figure shows the relationship between the grid and the points residing in it. Black point represents the center point of the grid, and the red points are the coordinates at the arbitrary positions inside the grid. The red points can be replaced by the center point of the grid to indicate that the points lies within the grid. During data load, CarbonData generates an Index for coordinate according to row and column of the grid(in the raster) where that coordinate lies. These Indexes are the same as Z order. For the detailed conversion algorithm, please refer to the design documents of spatial index. + +![File Directory Structure](../docs/images/spatial-index-1.png?raw=true) + +When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area. When the query polygon area is not disjoint from the grid center point, the grid is considered selected. In the following figure, user select a quadrilateral polygon, The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97 99->99 102->102 104->111 120->120 122->123 151->151 157->158 159->159 192->208 210->210 216->216 225->225 228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548 + +![File Directory Structure](../docs/images/spatial-index-2.png?raw=true) + + + +# Installation and Deployment Review comment: Modified This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc
VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc URL: https://github.com/apache/carbondata/pull/3520#discussion_r399398817 ## File path: docs/spatial-index-guide.md ## @@ -0,0 +1,109 @@ + + +# What is spatial index + +[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases. Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data. + + + +# How does carbondata implement spatial index + +There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for that coodrinate is generated using longitude and latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve). + +CarbonData rasterize the user data during data load into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. Hence, the coordinates loaded are often discrete and not continuous. + +Below figure shows the relationship between the grid and the points residing in it. Black point represents the center point of the grid, and the red points are the coordinates at the arbitrary positions inside the grid. The red points can be replaced by the center point of the grid to indicate that the points lies within the grid. During data load, CarbonData generates an Index for coordinate according to row and column of the grid(in the raster) where that coordinate lies. These Indexes are the same as Z order. For the detailed conversion algorithm, please refer to the design documents of spatial index. + +![File Directory Structure](../docs/images/spatial-index-1.png?raw=true) + +When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area. When the query polygon area is not disjoint from the grid center point, the grid is considered selected. In the following figure, user select a quadrilateral polygon, The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97 99->99 102->102 104->111 120->120 122->123 151->151 157->158 159->159 192->208 210->210 216->216 225->225 228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548 + +![File Directory Structure](../docs/images/spatial-index-2.png?raw=true) + + + +# Installation and Deployment + +Build source with modules geo open, can open "pom.xml" and check whether the mode has been open. + +![File Directory Structure](../docs/images/spatial-index-3.png?raw=true) Review comment: Modified This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files
akashrn5 commented on a change in pull request #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files URL: https://github.com/apache/carbondata/pull/3683#discussion_r399380210 ## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ## @@ -579,10 +580,28 @@ public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) thr } String UUID = String.valueOf(System.currentTimeMillis()); List segmentFilesToBeUpdatedLatest = new ArrayList<>(); +CarbonFile segmentFilesLocation = + FileFactory.getCarbonFile(CarbonTablePath.getSegmentFilesLocation(table.getTablePath())); for (Segment segment : segmentFilesToBeUpdated) { - String file = - SegmentFileStore.writeSegmentFile(table, segment.getSegmentNo(), UUID); - segmentFilesToBeUpdatedLatest.add(new Segment(segment.getSegmentNo(), file)); + SegmentFileStore fileStore = + new SegmentFileStore(table.getTablePath(), segment.getSegmentFileName()); + segment.setSegmentMetaDataInfo(fileStore.getSegmentFile().getSegmentMetaDataInfo()); Review comment: added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] ydvpankaj99 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files
ydvpankaj99 commented on issue #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files URL: https://github.com/apache/carbondata/pull/3683#issuecomment-605046083 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI
CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI URL: https://github.com/apache/carbondata/pull/3685#issuecomment-605032353 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/865/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI
CarbonDataQA1 commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI URL: https://github.com/apache/carbondata/pull/3685#issuecomment-605030448 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2573/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on issue #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file
jackylk commented on issue #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file URL: https://github.com/apache/carbondata/pull/3684#issuecomment-605002288 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files
Indhumathi27 commented on a change in pull request #3683: [CARBONDATA-3755]Fix clean up issue with respect to segmentMetadaInfo after update and clean files URL: https://github.com/apache/carbondata/pull/3683#discussion_r399261249 ## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ## @@ -579,10 +580,28 @@ public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) thr } String UUID = String.valueOf(System.currentTimeMillis()); List segmentFilesToBeUpdatedLatest = new ArrayList<>(); +CarbonFile segmentFilesLocation = + FileFactory.getCarbonFile(CarbonTablePath.getSegmentFilesLocation(table.getTablePath())); for (Segment segment : segmentFilesToBeUpdated) { - String file = - SegmentFileStore.writeSegmentFile(table, segment.getSegmentNo(), UUID); - segmentFilesToBeUpdatedLatest.add(new Segment(segment.getSegmentNo(), file)); + SegmentFileStore fileStore = + new SegmentFileStore(table.getTablePath(), segment.getSegmentFileName()); + segment.setSegmentMetaDataInfo(fileStore.getSegmentFile().getSegmentMetaDataInfo()); Review comment: Can you please add a testcase for this issue, if possible This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604995090 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2572/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604992269 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/864/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] ajantha-bhat commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI
ajantha-bhat commented on issue #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI URL: https://github.com/apache/carbondata/pull/3685#issuecomment-604967884 @niuge01 , @jackylk : please check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] ajantha-bhat opened a new pull request #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI
ajantha-bhat opened a new pull request #3685: [HOTFIX] Fix all flink test case failure and enable UT in CI URL: https://github.com/apache/carbondata/pull/3685 ### Why is this PR needed? After #3628, default BATCH_FILE_ORDER is wrong [it is not ASC or DSC]. so, all the test case in flink module failed due as no order is set in stage command. Also flink UT is not running in CI, hence it is not caught ### What changes were proposed in this PR? Fix the default value to Ascending order. Enable UT running for flink module. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file
CarbonDataQA1 commented on issue #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file URL: https://github.com/apache/carbondata/pull/3684#issuecomment-604943790 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2571/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file
CarbonDataQA1 commented on issue #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file URL: https://github.com/apache/carbondata/pull/3684#issuecomment-604941008 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/863/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] brijoobopanna commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
brijoobopanna commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604935581 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc
CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc URL: https://github.com/apache/carbondata/pull/3520#issuecomment-604894818 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2570/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc
CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc URL: https://github.com/apache/carbondata/pull/3520#issuecomment-604892466 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/862/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] QiangCai opened a new pull request #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file
QiangCai opened a new pull request #3684: [CARBONDATA-3756] Fix stage query bug it only read the first blocklet of each carbondata file URL: https://github.com/apache/carbondata/pull/3684 ### Why is this PR needed? The query of stage files only read the first blocklet of each carbondata file. So when the file contains multiple blocklets, the query result will be wrong. ### What changes were proposed in this PR? The query of stage files should read the all blocklets of all carbondata files. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (CARBONDATA-3756) the query of stage files only read the first blocklet of each carbondata file
David Cai created CARBONDATA-3756: - Summary: the query of stage files only read the first blocklet of each carbondata file Key: CARBONDATA-3756 URL: https://issues.apache.org/jira/browse/CARBONDATA-3756 Project: CarbonData Issue Type: Improvement Reporter: David Cai the query of stage files only read the first blocklet of each carbondata file. if the file contains multiple blocklets, the query result will be wrong. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3755) segment metadata info is not copied to new segment files after update and clean files
Akash R Nilugal created CARBONDATA-3755: --- Summary: segment metadata info is not copied to new segment files after update and clean files Key: CARBONDATA-3755 URL: https://issues.apache.org/jira/browse/CARBONDATA-3755 Project: CarbonData Issue Type: Bug Reporter: Akash R Nilugal Assignee: Akash R Nilugal segment metadata info is not copied to new segment files after update and clean files old segment files are also not getting deleted -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3754) old data and index files not getting cleared after SIrebuild
Akash R Nilugal created CARBONDATA-3754: --- Summary: old data and index files not getting cleared after SIrebuild Key: CARBONDATA-3754 URL: https://issues.apache.org/jira/browse/CARBONDATA-3754 Project: CarbonData Issue Type: Bug Reporter: Akash R Nilugal Assignee: Akash R Nilugal old data and index files not getting cleared after SIrebuild -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc
ajantha-bhat commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc URL: https://github.com/apache/carbondata/pull/3520#discussion_r399068116 ## File path: docs/spatial-index-guide.md ## @@ -0,0 +1,109 @@ + + +# What is spatial index + +[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases. Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data. + + + +# How does carbondata implement spatial index + +There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for that coodrinate is generated using longitude and latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve). + +CarbonData rasterize the user data during data load into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. Hence, the coordinates loaded are often discrete and not continuous. + +Below figure shows the relationship between the grid and the points residing in it. Black point represents the center point of the grid, and the red points are the coordinates at the arbitrary positions inside the grid. The red points can be replaced by the center point of the grid to indicate that the points lies within the grid. During data load, CarbonData generates an Index for coordinate according to row and column of the grid(in the raster) where that coordinate lies. These Indexes are the same as Z order. For the detailed conversion algorithm, please refer to the design documents of spatial index. + +![File Directory Structure](../docs/images/spatial-index-1.png?raw=true) + +When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area. When the query polygon area is not disjoint from the grid center point, the grid is considered selected. In the following figure, user select a quadrilateral polygon, The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97 99->99 102->102 104->111 120->120 122->123 151->151 157->158 159->159 192->208 210->210 216->216 225->225 228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548 Review comment: Mention below lines: The main reason for faster spatial queries in carbon is because of polygon filter will be pushed down to the carbon layer and carbon will scan only the matched blocklets instead of full scan. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] ajantha-bhat edited a comment on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc
ajantha-bhat edited a comment on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc URL: https://github.com/apache/carbondata/pull/3520#issuecomment-604842289 @MarvinLitt : please handle comment. I want to merge this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services