Re: [PR] DRILL-8478. HashPartition memory leak when it allocate memory exception with OutOfMemoryException (#2874) (drill)
shfshihuafeng commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1462854154 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); + throw UserException.memoryError(oom) + .message("OutOfMemory while allocate memory for hash partition.") Review Comment: i resubmit pr and supply test step -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DRILL-8478. HashPartition memory leak when it allocate memory exception with OutOfMemoryException (#2874) (drill)
paul-rogers commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1462817821 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); + throw UserException.memoryError(oom) + .message("OutOfMemory while allocate memory for hash partition.") Review Comment: Suggested: `"Failed to allocate hash partition."` The `memoryError()` already indicates that it is an OOM error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DRILL-8478. HashPartition memory leak when it allocate memory exception with OutOfMemoryException (#2874) (drill)
paul-rogers commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1462817821 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); + throw UserException.memoryError(oom) + .message("OutOfMemory while allocate memory for hash partition.") Review Comment: Suggested: `"Failed to allocate hash partition."` The `memoryError()` already indicate it is an OOM error. ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/AbstractHashBinaryRecordBatch.java: ## @@ -1312,7 +1313,9 @@ private void cleanup() { } // clean (and deallocate) each partition, and delete its spill file for (HashPartition partn : partitions) { - partn.close(); + if (Objects.nonNull(partn)) { Review Comment: Simpler `if (partn != null) {` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] DRILL-8478. HashPartition memory leak when it allocate memory exception with OutOfMemoryException (#2874) (drill)
shfshihuafeng opened a new pull request, #2875: URL: https://github.com/apache/drill/pull/2875 # [DRILL-](https://issues.apache.org/jira/browse/DRILL-): PR Title DRILL-8478. HashPartition memory leak when it allocate memory exception with OutOfMemoryException (#2874) ## Description when allocating memory for hashParttion with OutOfMemoryException,it cause memory leak. beacuase hashpartiton object cannot be created successfully, so it cannot be cleaned up In the closing phase. ## Documentation (Please describe user-visible changes similar to what should appear in the Drill documentation.) ## Testing (Please describe how this PR has been tested.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (DRILL-8478) mergejion memory leak when exception
shihuafeng created DRILL-8478: - Summary: mergejion memory leak when exception Key: DRILL-8478 URL: https://issues.apache.org/jira/browse/DRILL-8478 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.21.1 Reporter: shihuafeng Fix For: 1.21.2 *Describe the bug* hashpartition leak when allocate memory exception with OutOfMemoryException *To Reproduce* Steps to reproduce the behavior: # prepare data for tpch 1s # 20 concurrent for tpch sql8 # set direct memory 5g # when it had OutOfMemoryException , stopped all sql. # finding memory leak *Expected behavior* (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} (2) i run sql8 (sql detail as Additional context) with 20 concurrent (3) it had OutOfMemoryException when create hashPartion *Error detail, log output or screenshots* Unable to allocate buffer of size 262144 (rounded from 262140) due to memory limit (41943040). Current allocation: 20447232 sql {code:java} // code placeholder {code} select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as all_nations group by o_year order by o_year -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] hashpartition leak (drill)
shfshihuafeng commented on issue #2874: URL: https://github.com/apache/drill/issues/2874#issuecomment-1905301573 ok @cgivre -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] the mergejoin memery leak (drill)
shfshihuafeng commented on issue #2871: URL: https://github.com/apache/drill/issues/2871#issuecomment-1905300921 > @weijunlu, you may have found a bug. The behavior you described is not expected. > > Just to verify I understand: you ran your query once. Did the query run successfully the first time? Or, did your first query fail? If the query _worked_ the first time, we have one situation. If the query _failed_ the first time, we have another situation. > > Then, you ran the _same_ query a second time? This time you got an OOM error? > > I think we may have up to three distinct issues. > > First, I missed one important piece of information when I first read this report. You reported an exception: `Memory was leaked by query. Memory leaked: (249856)`. This messages indicates an actual bug. As I explained, Drill won't release the memory back to the OS. But, the query should have released the memory back to Netty. The error message says that it did not. This would be a bug. Even if the query fails, Drill is required to return memory to Netty. Such errors are, unfortunately, hard to track down. Our unit tests have many such checks, but your query appears to have uncovered a case that the unit tests do not check. > > The stack trace suggests that the memory leak was detected as Drill was trying to shut down a failed query. So, I suspect your query run may not have actually worked. If it did work, you should see in the UI a very detailed report on all the operators in the query. Did you get that detailed report? > > Second, I wonder if Drill has sufficient memory for the query you are trying to run. The TPCH queries tend to be memory-hungry. This query has six concurrent sorts. Then, it has five merge joins, plus more sorts. While the query runs, all the needed data will be in memory. (It is in memory because Drill uses memory to buffer data to keep things fast.) > > I do not recall the size of a SF1 TPCH data set. What is the total size of the files in your data set? > > Back when we used to run TPCH tests regularly, we would use a cluster of 5 or 10 Drillbits, each with a generous amount of memory, so that Drill can hold the data in memory. I suspect that 2 GB of direct memory, on a single Drillbit, is probably not enough for this kind of query. > > So, I suspect you need more direct memory. The UI says that you are using only 34% of a 4GB heap, but ~100% of 2GB direct memory. Try changing this allocation. Try allocating 3GB to heap. This will raise heap usage to 45%. Then, give as much memory as your machine has to direct memory. If you are running SqlLine on the same machine, please run it on another machine instead. If you have other services on your machine (a database, an IDE, etc.), I would recommend moving Drill to a dedicated machine. You may even consider using a cluster of machines to provide sufficient resources. > > Once you get a query that succeeds, you can look at the detailed query profile to find out how much memory it required. Then, you can reduce the direct memory given to Drill to a lower amount, if the query allows. > > Once you find a memory level that allows the query to run, you will probably avoid the memory leak error. The query will succeed and memory should be freed back to Netty as normal. TPCH SF1 queries used to pass. They should still work, unless something changed in the last couple of years in one of the operators. > > Just to be clear, Drill works well on a laptop with the standard amount of memory _if you run simple queries on small datasets_. But, if you do classic TPCH "big data" queries, you need a cluster with enough capacity — that's why we use Hadoop. > > The third issue is a minor point: the UI screenshot is a bit odd: it says you are at 94% of direct memory. I suspect you are actually at 100%, and the UI is dividing by the wrong number. (1000 vs. 1024 or some such). i think it leak ponit , when sql stop ,tmp memory should be clearup . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] hashpartition leak (drill)
cgivre commented on issue #2874: URL: https://github.com/apache/drill/issues/2874#issuecomment-1905299722 @shfshihuafeng Can you please submit this as a pull request? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] hashpartition leak (drill)
shfshihuafeng commented on issue #2874: URL: https://github.com/apache/drill/issues/2874#issuecomment-1905298953 I fixed the bug see attachment [0001-bug-10227615-hashPartition-leak.patch](https://github.com/apache/drill/files/14018533/0001-bug-10227615-hashPartition-leak.patch) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] hashpartition leak (drill)
shfshihuafeng opened a new issue, #2874: URL: https://github.com/apache/drill/issues/2874 Before submitting a bug report, please verify that you are using the most current version of Drill. **Describe the bug** hashpartition leak when allocate memory exception with OutOfMemoryException **To Reproduce** Steps to reproduce the behavior: 1. prepare data for tpch 1s 2. 20 concurrent for tpch sql8 3. set direct memory 5g 4. when it had OutOfMemoryException , stopped all sql. 5. finding memory leak **Expected behavior** (1)i set {DRILL_MAX_DIRECT_MEMORY:-"5G"} (2) i run sql8 (sql detail as Additional context) with 20 concurrent (3) it had OutOfMemoryException when create hashPartion **Error detail, log output or screenshots** Unable to allocate buffer of size 262144 (rounded from 262140) due to memory limit (41943040). Current allocation: 20447232 **Drill version** drill 2.0 **Additional context** Add any other context about the problem here. ``` select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as all_nations group by o_year order by o_year ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] union duplicate key (drill)
yaozhu commented on issue #2873: URL: https://github.com/apache/drill/issues/2873#issuecomment-1905217272 This is a test case written by testers, without actual usage scenarios. What do you mean by combining them? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] union duplicate key (drill)
yaozhu commented on issue #2873: URL: https://github.com/apache/drill/issues/2873#issuecomment-1905216330 > I'm not yet convinced there is a bug in Drill here. I think the issue may actually be in your query. My hunch here is that there may be an issue in the `ORDER BY` statement. Your inner tables each have columns called `k1`. Then you are calling a sort on that without specifying which one it is. You might first try adding table names to the `ORDER BY` clause. > > Secondly, I'm not trying to be rude but what are you trying to achieve with this query? It looks like you are performing a union of a left and right join of the same table? Why not combine these? Then you have nested `SELECT *` statements that don't really seem to serve any purpose. This is a test case written by testers, without actual usage scenarios. What do you mean by combining them? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] union duplicate key (drill)
cgivre commented on issue #2873: URL: https://github.com/apache/drill/issues/2873#issuecomment-1905132740 I'm not yet convinced there is a bug in Drill here. I think the issue may actually be in your query. My hunch here is that there may be an issue in the `ORDER BY` statement. Your inner tables each have columns called `k1`. Then you are calling a sort on that without specifying which one it is. You might first try adding table names to the `ORDER BY` clause. Secondly, I'm not trying to be rude but what are you trying to achieve with this query? It looks like you are performing a union of a left and right join of the same table? Why not combine these?Then you have nested `SELECT *` statements that don't really seem to serve any purpose. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] union duplicate key (drill)
yaozhu commented on issue #2873: URL: https://github.com/apache/drill/issues/2873#issuecomment-1905126799 @cgivre are u clear? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] union duplicate key (drill)
yaozhu commented on issue #2873: URL: https://github.com/apache/drill/issues/2873#issuecomment-1905105700 > Can you please provide a screenshot or some explanation that is more clear? This is really difficult to follow. ![image](https://github.com/apache/drill/assets/13361571/82b08db4-1d6a-48aa-9f0e-1db3c7018417) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] union duplicate key (drill)
cgivre commented on issue #2873: URL: https://github.com/apache/drill/issues/2873#issuecomment-1905101608 Can you please provide a screenshot or some explanation that is more clear? This is really difficult to follow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] union duplicate key (drill)
yaozhu opened a new issue, #2873: URL: https://github.com/apache/drill/issues/2873 mysql table baseall and table test has the same tale struct and recored show create table baseall; +-+-+ | Table | Create Table | +-+-+ | baseall | CREATE TABLE `baseall` ( `k0` tinyint(1) DEFAULT NULL, `k1` tinyint DEFAULT NULL, `k2` smallint DEFAULT NULL, `k3` int DEFAULT NULL, `k4` bigint DEFAULT NULL, `k5` decimal(9,3) DEFAULT NULL, `k6` char(5) DEFAULT NULL, `k10` date DEFAULT NULL, `k11` datetime DEFAULT NULL, `k7` varchar(20) DEFAULT NULL, `k8` double DEFAULT NULL, `k9` float DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +-+-+ 1 row in set (0.00 sec) show create table test; +---+--+ | Table | Create Table | +---+--+ | test | CREATE TABLE `test` ( `k0` tinyint(1) DEFAULT NULL, `k1` tinyint DEFAULT NULL, `k2` smallint DEFAULT NULL, `k3` int DEFAULT NULL, `k4` bigint DEFAULT NULL, `k5` decimal(9,3) DEFAULT NULL, `k6` char(5) DEFAULT NULL, `k10` date DEFAULT NULL, `k11` datetime DEFAULT NULL, `k7` varchar(20) DEFAULT NULL, `k8` double DEFAULT NULL, `k9` float DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +---+--+ 1 row in set (0.00 sec)
Re: Parquet files with microsecond columns
Hi Peter, It sounds like you are on the right track: the new option is the quick short-term solution. The best long-term solution is to generalize Drill's date/time type, but that would take much more work. (Drill also has a bug where the treatment of timezones is incorrect, which forces Drill to run in the UTC time zone -- something that will also require difficult work.) Given that JDBC works, the problem must be in the web interface, not in your Parquet implementation. You've solved the problem with a new session option. The web interface, however, has no sessions: if you set an option in one call, and do your query in another, Drill will have "forgotten" your option. Instead, there is a way to attach options to each query. Are you using that feature? As I recall, the JSON message to submit a query has an additional field to hold session options. I do not recall, however, if the web UI added that feature. Does anyone else know? Two workarounds. First, use your favorite JSON request tool to submit a query with the option set. Second, set your option as a system option so it is available to all sessions: ALTER SYSTEM SET... Thanks, - Paul On Mon, Jan 22, 2024 at 1:38 AM Peter Franzen wrote: > Hi, > > I am using Drill to query Parquet files that have fields of type > timestamp_micros. By default, Drill truncates those microsecond > values to milliseconds when reading the Parquet files in order to convert > them to SQL timestamps. > > In some of my use cases I need to read the original microsecond values (as > 64-bit values, not SQL timestamps) through Drill, but > this doesn’t seem to be possible (unless I’ve missed something). > > I have explored a possible solution to this, and would like to run it by > some developers more experienced with the Drill code base > before I create a pull request. > > My idea is to add tow options similar to > “store.parquet.reader.int96_as_timestamp" to control whether or not > microsecond > times and timestamps are truncated to milliseconds. These options would be > added to “org.apache.drill.exec.ExecConstants" and > "org.apache.drill.exec.server.options.SystemOptionManager", and to > drill-module.conf: > > store.parquet.reader.time_micros_as_int64: false, > store.parquet.reader.timestamp_micros_as_int64: false, > > These options would then be used in the same places as > “store.parquet.reader.int96_as_timestamp”: > > org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory > > org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter > org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter > > to create an int64 reader instead of a time/timestamp reader when the > correspodning option is set to true. > > In addition to this, > “org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector” must > be altered to _not_ truncate the min and max > values for time_micros/timestamp_micros if the corresponding option is > true. This class doesn’t have a reference to an OptionManager, so > my guess is that the two new options must be extractred from the > OptionManager when the ParquetReaderConfig instance is created. > > Filtering on microsecond columns would be done using 64-bit values rather > than TIME/TIMESTAMP values, e.g. > > select * from where = 1705914906694751; > > I’ve tested the solution outlined above, and it seems to work when using > sqlline and with the JDBC driver, but not with the web based interface. > Any pointers to the relevent code for that would be appreciated. > > An alternative solution to the above could be to intercept all reading of > the Parquet schemas and modifying the schema to report the > microsecond columns as int64 columns, i.e. to completely discard the > information that the columns contain time/timestamp values. > This could potentially make parts of the code where it is not obvious that > the time/timestamp properties of columns are used behave > as expected. However, this variant would not align with how INT96 > timestamps are handled. > > Any thoughts on this idea for how to access microsecond values would be > highly appreciated. > > Thanks, > > /Peter > >
Re: DFDL Standard approved for ISO JTC1 PAS (Pre-Approved Standard)
Nice! > On Jan 22, 2024, at 08:44, Mike Beckerle wrote: > > I received notice today that the DFDL OGF standard is officially headed to > become an ISO standard. The ballot within ISO JTC1 passed 100%. > > I will let you know more info when I get it about how we can propagate this > information, any branding considerations ISO JTC1 requires about it, their > trademark rules, etc. > At that point it will make sense to send this info to our users lists, make > a blog post, etc. > > Mike Beckerle > Apache Daffodil PMC | daffodil.apache.org > OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl > Owl Cyber Defense | www.owlcyberdefense.com
DFDL Standard approved for ISO JTC1 PAS (Pre-Approved Standard)
I received notice today that the DFDL OGF standard is officially headed to become an ISO standard. The ballot within ISO JTC1 passed 100%. I will let you know more info when I get it about how we can propagate this information, any branding considerations ISO JTC1 requires about it, their trademark rules, etc. At that point it will make sense to send this info to our users lists, make a blog post, etc. Mike Beckerle Apache Daffodil PMC | daffodil.apache.org OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Owl Cyber Defense | www.owlcyberdefense.com
Parquet files with microsecond columns
Hi, I am using Drill to query Parquet files that have fields of type timestamp_micros. By default, Drill truncates those microsecond values to milliseconds when reading the Parquet files in order to convert them to SQL timestamps. In some of my use cases I need to read the original microsecond values (as 64-bit values, not SQL timestamps) through Drill, but this doesn’t seem to be possible (unless I’ve missed something). I have explored a possible solution to this, and would like to run it by some developers more experienced with the Drill code base before I create a pull request. My idea is to add tow options similar to “store.parquet.reader.int96_as_timestamp" to control whether or not microsecond times and timestamps are truncated to milliseconds. These options would be added to “org.apache.drill.exec.ExecConstants" and "org.apache.drill.exec.server.options.SystemOptionManager", and to drill-module.conf: store.parquet.reader.time_micros_as_int64: false, store.parquet.reader.timestamp_micros_as_int64: false, These options would then be used in the same places as “store.parquet.reader.int96_as_timestamp”: org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter to create an int64 reader instead of a time/timestamp reader when the correspodning option is set to true. In addition to this, “org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector” must be altered to _not_ truncate the min and max values for time_micros/timestamp_micros if the corresponding option is true. This class doesn’t have a reference to an OptionManager, so my guess is that the two new options must be extractred from the OptionManager when the ParquetReaderConfig instance is created. Filtering on microsecond columns would be done using 64-bit values rather than TIME/TIMESTAMP values, e.g. select * from where = 1705914906694751; I’ve tested the solution outlined above, and it seems to work when using sqlline and with the JDBC driver, but not with the web based interface. Any pointers to the relevent code for that would be appreciated. An alternative solution to the above could be to intercept all reading of the Parquet schemas and modifying the schema to report the microsecond columns as int64 columns, i.e. to completely discard the information that the columns contain time/timestamp values. This could potentially make parts of the code where it is not obvious that the time/timestamp properties of columns are used behave as expected. However, this variant would not align with how INT96 timestamps are handled. Any thoughts on this idea for how to access microsecond values would be highly appreciated. Thanks, /Peter