Re: [DISCUSS] Some ideas for Drill 1.21

2022-02-09 Thread Ted Dunning
The planning time has been extensively analyzed.

It is inherent in a Volcano-style cost-based optimizer. This is a
branch-and-bound search of an exponential design space.

This bottleneck is very well understood.

Further, it has been accelerated under specialized conditions. As part of
OJAI, there was a limited form of Drill that was included that could work
on specific kinds of tables built into MapR FS. With some rather severe
truncations of the space that the optimizer had to search, the planning
time could be reduced to tens of milliseconds. That was fine for a limited
mission, but some of the really dramatic benefits of Drill on large
queries across complex domains would be impossible with that truncated rule
set.



On Wed, Feb 9, 2022 at 7:06 PM Paul Rogers  wrote:

> Hi All,
>
> Would be great to understand the source of the slow planning. Back in the
> day, I recall colleagues trying all kinds of things to speed up planning,
> but without the time to really figure out where the time went.
>
> I wonder if the two points are related. If most of that planning time is
> spent waiting for a plugin metadata, then James' & Charles' issue could
> possibly be the cause of the slowness that Ted saw.
>
> James, it is still not clear what plugin metadata is being retrieved, and
> when. Now, it is hard to figure that out; that code is complex. Ideally, if
> you have a dozen plugins enabled, but query only one, then only that one
> should be doing anything. Further, if you're using an external system (like
> JDBC), the plugin should query the remote system tables only for the
> table(s) you hit in your query. If the code asks ALL plugins for
> information, or grabs all tables from the remote system, they, yeah, it's
> going to be slow.
>
> Adding per-plugin caching might make sense. For JDBC, say, it is not likely
> that the schema of the remote DB changes between queries, so caching for
> some amount of time is probably fine. And, if a query asks for an unknown
> column, the plugin could refresh metadata to see if the column was just
> added. (I was told that Impala users constantly had to run REFRESH METADATA
> to pick up new files added to HDFS.)
>
> For the classic, original use case (Parquet or CSV files on an HDFS-like
> system), the problem was the need to scan the directory structure at plan
> time to figure out which files to scan at run time. For Parquet, the
> planner also wants to do Parquet row group pruning, which requires reading
> the header of every one of the target files. Since this was slow, Drill
> would create a quick & dirty cache, but with large numbers of files, even
> reading that cache was slow (and, Drill would rebuild it any time a
> directory changed, which greatly slowed planning.)
>
> For that classic use case, saved plans never seemed a win because the
> "shape" of the query heavily depended on the WHERE clause: one clause might
> hit a small set of files, another hit a large set, and that then throws off
> join planning, hash/broadcast exchange decisions and so on.
>
> So, back to the suggestion to start with understanding where the time goes.
> Any silly stuff we can just stop doing? Is the cost due to external
> factors, such as those cited above? Or, is Calcite itself just heavy
> weight? Calcite is a rules engine. Add more rules or more nodes in the DAG,
> and the cost of planning rises steeply. So, are we fiddling about too much
> in the planning process?
>
> One way to test: use a mock data source and plan-time components to
> eliminate all external factors. Time various query shapes using EXPLAIN.
> How long does Calcite take? If a long time, then we've got a rather
> difficult problem as Calcite is hard to fix/replace.
>
> Then, time the plugins of interest. Figure out how to optimize those.
>
> My guess is that the bottleneck won't turn out to be what we think it is.
> It usually isn't.
>
> - Paul
>
> On Tue, Feb 8, 2022 at 8:19 AM Ted Dunning  wrote:
>
> > James, you make some good points.
> >
> > I would generally support what you say except for one special case. I
> think
> > that there is a case to be made to be able to cache query plans in some
> > fashion.
> >
> > The traditional approach to do this is to use "prepared queries" by which
> > the application signals that it is willing to trust that a query plan
> will
> > continue to be correct for the duration of its execution. My experience
> > (and I think the industry's as well) is that the query plan is more
> stable
> > than the underlying details of the metadata and this level of caching (or
> > more) is a very good idea.
> >
> > In particular, the benefit to Drill is that we have a very expensive
> query
> > planning phase (I have seen numbers in the range 200-800ms routinely)
> but I
> > have seen execution times that are as low as a few 10's of ms. This
> > imbalance severely compromises the rate of concurrent querying for fast
> > queries. Having some form of plan caching would allow this planning
> > 

[jira] [Resolved] (DRILL-8129) Storage-phoenix cannot resolve OSGi bundle apache-ds.jdbm1

2022-02-09 Thread James Turton (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Turton resolved DRILL-8129.
-
Resolution: Fixed

> Storage-phoenix cannot resolve OSGi bundle apache-ds.jdbm1
> --
>
> Key: DRILL-8129
> URL: https://issues.apache.org/jira/browse/DRILL-8129
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Blocker
> Fix For: 1.20.0
>
>
> Because this dependency is of type "bundle", the module requires the 
> maven-bundle-plugin in order to resolve it, and for the module to build.
>  
> {code:java}
> [ERROR] Failed to execute goal on project drill-storage-phoenix: Could not 
> resolve dependencies for project 
> org.apache.drill.contrib:drill-storage-phoenix:jar:1.20.0-SNAPSHOT: Failure 
> to find org.apache.directory.jdbm:apacheds-jdbm1:bundle:2.0.0-M2 in 
> https://conjars.org/repo was cached in the local repository, resolution will 
> not be reattempted until the update interval of conjars has elapsed or 
> updates are forced -{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [drill] jnturton merged pull request #2457: DRILL-8129: Storage-phoenix cannot resolve OSGi bundle apache-ds.jdbm1

2022-02-09 Thread GitBox


jnturton merged pull request #2457:
URL: https://github.com/apache/drill/pull/2457


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [DISCUSS] Some ideas for Drill 1.21

2022-02-09 Thread Paul Rogers
Hi All,

Would be great to understand the source of the slow planning. Back in the
day, I recall colleagues trying all kinds of things to speed up planning,
but without the time to really figure out where the time went.

I wonder if the two points are related. If most of that planning time is
spent waiting for a plugin metadata, then James' & Charles' issue could
possibly be the cause of the slowness that Ted saw.

James, it is still not clear what plugin metadata is being retrieved, and
when. Now, it is hard to figure that out; that code is complex. Ideally, if
you have a dozen plugins enabled, but query only one, then only that one
should be doing anything. Further, if you're using an external system (like
JDBC), the plugin should query the remote system tables only for the
table(s) you hit in your query. If the code asks ALL plugins for
information, or grabs all tables from the remote system, they, yeah, it's
going to be slow.

Adding per-plugin caching might make sense. For JDBC, say, it is not likely
that the schema of the remote DB changes between queries, so caching for
some amount of time is probably fine. And, if a query asks for an unknown
column, the plugin could refresh metadata to see if the column was just
added. (I was told that Impala users constantly had to run REFRESH METADATA
to pick up new files added to HDFS.)

For the classic, original use case (Parquet or CSV files on an HDFS-like
system), the problem was the need to scan the directory structure at plan
time to figure out which files to scan at run time. For Parquet, the
planner also wants to do Parquet row group pruning, which requires reading
the header of every one of the target files. Since this was slow, Drill
would create a quick & dirty cache, but with large numbers of files, even
reading that cache was slow (and, Drill would rebuild it any time a
directory changed, which greatly slowed planning.)

For that classic use case, saved plans never seemed a win because the
"shape" of the query heavily depended on the WHERE clause: one clause might
hit a small set of files, another hit a large set, and that then throws off
join planning, hash/broadcast exchange decisions and so on.

So, back to the suggestion to start with understanding where the time goes.
Any silly stuff we can just stop doing? Is the cost due to external
factors, such as those cited above? Or, is Calcite itself just heavy
weight? Calcite is a rules engine. Add more rules or more nodes in the DAG,
and the cost of planning rises steeply. So, are we fiddling about too much
in the planning process?

One way to test: use a mock data source and plan-time components to
eliminate all external factors. Time various query shapes using EXPLAIN.
How long does Calcite take? If a long time, then we've got a rather
difficult problem as Calcite is hard to fix/replace.

Then, time the plugins of interest. Figure out how to optimize those.

My guess is that the bottleneck won't turn out to be what we think it is.
It usually isn't.

- Paul

On Tue, Feb 8, 2022 at 8:19 AM Ted Dunning  wrote:

> James, you make some good points.
>
> I would generally support what you say except for one special case. I think
> that there is a case to be made to be able to cache query plans in some
> fashion.
>
> The traditional approach to do this is to use "prepared queries" by which
> the application signals that it is willing to trust that a query plan will
> continue to be correct for the duration of its execution. My experience
> (and I think the industry's as well) is that the query plan is more stable
> than the underlying details of the metadata and this level of caching (or
> more) is a very good idea.
>
> In particular, the benefit to Drill is that we have a very expensive query
> planning phase (I have seen numbers in the range 200-800ms routinely) but I
> have seen execution times that are as low as a few 10's of ms. This
> imbalance severely compromises the rate of concurrent querying for fast
> queries. Having some form of plan caching would allow this planning
> overhead to drop to zero in select cases.
>
> I have been unable to even consider working on this problem, but it seems
> that one interesting heuristic would be based on two factors:
> - the ratio of execution time to planning time
> The rationale is that if a query takes much longer to run than to plan, we
> might as well do planning each time. Conversely, if a query takes much less
> time to run than it takes to plan, it is very important to avoid that
> planning time.
>
> - the degree to which recent execution times seem inconsistent with longer
> history
> The rationale here is that a persistent drop in performance for a query is
> a strong indicator that any cached plan is no longer valid and should be
> updated. Conversely, if recent query history is consistent with long-term
> history, that is a vote of confidence for the plan. Furthermore, depending
> on how this is implemented the chance of a false positive change detec

[GitHub] [drill] vvysotskyi commented on a change in pull request #2457: DRILL-8129: Storage-phoenix cannot resolve OSGi bundle apache-ds.jdbm1

2022-02-09 Thread GitBox


vvysotskyi commented on a change in pull request #2457:
URL: https://github.com/apache/drill/pull/2457#discussion_r803020109



##
File path: contrib/storage-phoenix/pom.xml
##
@@ -326,6 +330,12 @@
   -Xms2048m -Xmx2048m
 
   
+  
+org.apache.felix

Review comment:
   Ok, thanks for the explanation. Yes, looks like `MiniKdc` depends on 
this library so it cannot be excluded. Could you please add this plugin under 
the hadoop-2 profile?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] jnturton commented on a change in pull request #2457: DRILL-8129: Storage-phoenix cannot resolve OSGi bundle apache-ds.jdbm1

2022-02-09 Thread GitBox


jnturton commented on a change in pull request #2457:
URL: https://github.com/apache/drill/pull/2457#discussion_r802998151



##
File path: contrib/storage-phoenix/pom.xml
##
@@ -326,6 +330,12 @@
   -Xms2048m -Xmx2048m
 
   
+  
+org.apache.felix

Review comment:
   @vvysotskyi Without it I cannot build storage-phoenix using `-Phadoop-2`:
   
   ```
   [INFO] 

   [INFO] BUILD FAILURE
   [INFO] 

   [INFO] Total time:  2.275 s (Wall Clock)
   [INFO] Finished at: 2022-02-09T19:44:28+02:00
   [INFO] 

   [ERROR] Failed to execute goal on project drill-storage-phoenix: Could not 
resolve dependencies for project 
org.apache.drill.contrib:drill-storage-phoenix:jar:1.20.0-SNAPSHOT: Failure to 
find org.apache.directory.jdbm:apacheds-jdbm1:bundle:2.0.0-M2 in 
https://conjars.org/repo was cached in the local repository, resolution will 
not be reattempted until the update interval of conjars has elapsed or updates 
are forced -> [Help 1]
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] jnturton commented on a change in pull request #2457: DRILL-8129: Storage-phoenix cannot resolve OSGi bundle apache-ds.jdbm1

2022-02-09 Thread GitBox


jnturton commented on a change in pull request #2457:
URL: https://github.com/apache/drill/pull/2457#discussion_r802998151



##
File path: contrib/storage-phoenix/pom.xml
##
@@ -326,6 +330,12 @@
   -Xms2048m -Xmx2048m
 
   
+  
+org.apache.felix

Review comment:
   Without it I cannot build storage-phoenix using `-Phadoop-2`:
   
   ```
   [INFO] 

   [INFO] BUILD FAILURE
   [INFO] 

   [INFO] Total time:  2.275 s (Wall Clock)
   [INFO] Finished at: 2022-02-09T19:44:28+02:00
   [INFO] 

   [ERROR] Failed to execute goal on project drill-storage-phoenix: Could not 
resolve dependencies for project 
org.apache.drill.contrib:drill-storage-phoenix:jar:1.20.0-SNAPSHOT: Failure to 
find org.apache.directory.jdbm:apacheds-jdbm1:bundle:2.0.0-M2 in 
https://conjars.org/repo was cached in the local repository, resolution will 
not be reattempted until the update interval of conjars has elapsed or updates 
are forced -> [Help 1]
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] vvysotskyi commented on a change in pull request #2457: DRILL-8129: Storage-phoenix cannot resolve OSGi bundle apache-ds.jdbm1

2022-02-09 Thread GitBox


vvysotskyi commented on a change in pull request #2457:
URL: https://github.com/apache/drill/pull/2457#discussion_r802978299



##
File path: contrib/storage-phoenix/pom.xml
##
@@ -326,6 +330,12 @@
   -Xms2048m -Xmx2048m
 
   
+  
+org.apache.felix

Review comment:
   Could you please clarify what is the reason for adding this plugin?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] jnturton opened a new pull request #2457: DRILL-8129: Storage-phoenix cannot resolve OSGi bundle apache-ds.jdbm1

2022-02-09 Thread GitBox


jnturton opened a new pull request #2457:
URL: https://github.com/apache/drill/pull/2457


   # [DRILL-8129](https://issues.apache.org/jira/browse/DRILL-8129): 
Storage-phoenix cannot resolve OSGi bundle apache-ds.jdbm1
   
   ## Description
   
   Because this dependency is of type "bundle", the module requires the 
maven-bundle-plugin in order to resolve it, and for the module to build.
   
   ## Documentation
   N/A
   
   ## Testing
   Build Drill under the default profile and under -Phadoop-2
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (DRILL-8129) Storage-phoenix cannot resolve OSGi bundle apache-ds.jdbm1

2022-02-09 Thread James Turton (Jira)
James Turton created DRILL-8129:
---

 Summary: Storage-phoenix cannot resolve OSGi bundle apache-ds.jdbm1
 Key: DRILL-8129
 URL: https://issues.apache.org/jira/browse/DRILL-8129
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.20.0
Reporter: James Turton
Assignee: James Turton
 Fix For: 1.20.0


Because this dependency is of type "bundle", the module requires the 
maven-bundle-plugin in order to resolve it, and for the module to build.

 
{code:java}
[ERROR] Failed to execute goal on project drill-storage-phoenix: Could not 
resolve dependencies for project 
org.apache.drill.contrib:drill-storage-phoenix:jar:1.20.0-SNAPSHOT: Failure to 
find org.apache.directory.jdbm:apacheds-jdbm1:bundle:2.0.0-M2 in 
https://conjars.org/repo was cached in the local repository, resolution will 
not be reattempted until the update interval of conjars has elapsed or updates 
are forced -{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release Apache Drill 1.20.0 - RC1

2022-02-09 Thread James Turton
Thanks everyone for testing.  It turns out I've broken a couple of 
points of Git and Maven release protocol, in large part from my efforts 
to release two builds.  An RC 2 is now being prepared.


On 2022/02/09 18:53, Vova Vysotskyi wrote:

Hi James!

Thanks for doing the RC so rapidly!

I was verifying the previous RC, and after announcing this one switched to it. 
The release should be based on the commit generated by Maven 
([maven-release-plugin] prepare release drill-XXX), and for the previous 
release candidate that was so, but for this one, commit id refers to the 
previous commit: (DRILL-8126: Ignore OAuth Parameter in Storage Plugin).
Running select * from sys.version; query from Drill also returns that commit, 
but in this case, it is strange that the version was 1.20.0 since those changes 
weren't committed in the branch which head is 
73a829a5a0eb21fc35d6cfd878310b7069135ecd...

Kind regards,
Volodymyr Vysotskyi

On 2022/02/08 14:54:36 James Turton wrote:

Hi all

 Note from the release manager.

I'll undertake to add an Hadoop 2 release candidate shortly.  I have
checked and the issue found in RC0 (DRILL-8126) is fixed.  That is the
only change between this RC and the previous one.

- Thank, James

I'd like to propose the second release candidate (RC1) of Apache Drill,
version 1.20.0.

The release candidate covers a total of 106 resolved JIRAs [1]. Thanks
to everyone who contributed to this release.

The tarball artifacts are hosted at [2] and the maven artifacts are
hosted at [3].

This release candidate is based on commit
73a829a5a0eb21fc35d6cfd878310b7069135ecd located at [4].

Please download and try out the release.

[ ] +1
[ ] +0
[ ] -1

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301&projectId=12313820
[2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-rc1/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1088/
[4] https://github.com/jnturton/drill/commits/drill-1.20.0





Re: [VOTE] Release Apache Drill 1.20.0 - RC1

2022-02-09 Thread Vova Vysotskyi
Hi James!

Thanks for doing the RC so rapidly!

I was verifying the previous RC, and after announcing this one switched to it. 
The release should be based on the commit generated by Maven 
([maven-release-plugin] prepare release drill-XXX), and for the previous 
release candidate that was so, but for this one, commit id refers to the 
previous commit: (DRILL-8126: Ignore OAuth Parameter in Storage Plugin).
Running select * from sys.version; query from Drill also returns that commit, 
but in this case, it is strange that the version was 1.20.0 since those changes 
weren't committed in the branch which head is 
73a829a5a0eb21fc35d6cfd878310b7069135ecd...

Kind regards,
Volodymyr Vysotskyi

On 2022/02/08 14:54:36 James Turton wrote:
> Hi all
> 
>  Note from the release manager.
> 
> I'll undertake to add an Hadoop 2 release candidate shortly.  I have 
> checked and the issue found in RC0 (DRILL-8126) is fixed.  That is the 
> only change between this RC and the previous one.
> 
> - Thank, James
> 
> I'd like to propose the second release candidate (RC1) of Apache Drill, 
> version 1.20.0.
> 
> The release candidate covers a total of 106 resolved JIRAs [1]. Thanks 
> to everyone who contributed to this release.
> 
> The tarball artifacts are hosted at [2] and the maven artifacts are 
> hosted at [3].
> 
> This release candidate is based on commit 
> 73a829a5a0eb21fc35d6cfd878310b7069135ecd located at [4].
> 
> Please download and try out the release.
> 
> [ ] +1
> [ ] +0
> [ ] -1
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301&projectId=12313820
> [2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-rc1/
> [3] https://repository.apache.org/content/repositories/orgapachedrill-1088/
> [4] https://github.com/jnturton/drill/commits/drill-1.20.0
> 


Re: [VOTE] Release Apache Drill 1.20.0 - RC1

2022-02-09 Thread James Turton
The storage-phoenix plugin put up an unexpected final fight when it was 
asked to build under the -Phadoop-2 profile but I think that all is now 
okay with the Hadoop 2 build.


Tar balls 
https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-hadoop-2-rc1/

Git tag https://github.com/jnturton/drill/commits/drill-1.20.0-hadoop-2
Maven 
https://repository.apache.org/content/repositories/orgapachedrill-1089/


Please test this build too, especially if you have an Hadoop 2 
environment handy.


So far I have only been able to produce this additional build in the 
form of an entirely new release with version 1.20.0-hadoop-2. I did not 
see a way to avoid a new release given the tools we use today but if 
there are maven-release-plugin secrets that I need to be taught please 
don't hesitate to do that.


A downside of the -hadoop-2 version number I've generated is that I 
believe that 1.20.0-hadoop-2 > 1.20.0 in Maven's eyes, an inequality 
which could possibly do something weird to someone out there without 
pinned dependency versions. An alternative to drill-1.20.0-hadoop-2 
(version number modified) that was considered was drill-hadoop-2-1.20.0 
(package name modified), we can discuss that if you'd like.


James

On 2022/02/08 16:54, James Turton wrote:

Hi all

 Note from the release manager.

I'll undertake to add an Hadoop 2 release candidate shortly.  I have 
checked and the issue found in RC0 (DRILL-8126) is fixed. That is the 
only change between this RC and the previous one.


- Thank, James

I'd like to propose the second release candidate (RC1) of Apache 
Drill, version 1.20.0.


The release candidate covers a total of 106 resolved JIRAs [1]. Thanks 
to everyone who contributed to this release.


The tarball artifacts are hosted at [2] and the maven artifacts are 
hosted at [3].


This release candidate is based on commit 
73a829a5a0eb21fc35d6cfd878310b7069135ecd located at [4].


Please download and try out the release.

[ ] +1
[ ] +0
[ ] -1

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301&projectId=12313820

[2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-rc1/
[3] 
https://repository.apache.org/content/repositories/orgapachedrill-1088/

[4] https://github.com/jnturton/drill/commits/drill-1.20.0




[GitHub] [drill] rymarm opened a new pull request #2456: DRILL-8122: Change kafka metadata obtaining due to KAFKA-5697

2022-02-09 Thread GitBox


rymarm opened a new pull request #2456:
URL: https://github.com/apache/drill/pull/2456


   # [DRILL-8122](https://issues.apache.org/jira/browse/DRILL-8122): Change 
kafka metadata obtaining due to KAFKA-5697
   
   ## Description
   
   
[`Consumer#poll(long)`](https://javadoc.io/static/org.apache.kafka/kafka-clients/3.1.0/org/apache/kafka/clients/consumer/Consumer.html#poll-long-)
 is deprecated starting from kafka 2.0. In Drill, `Consumer#poll` is used in 2 
places:
   1. [By its direct purpose
   
](https://github.com/apache/drill/blob/15b2f52260e4f0026f2dfafa23c5d32e0fb66502/contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/MessageIterator.java#L82)
   2. As an only one way to make a Kafka consumer [update metadata
   
](https://github.com/apache/drill/blob/15b2f52260e4f0026f2dfafa23c5d32e0fb66502/contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaGroupScan.java#L185)
   
   Kafka [hasn't 
implemented](https://cwiki.apache.org/confluence/display/KAFKA/KIP-505%3A+Add+new+public+method+to+only+update+assignment+metadata+in+consumer)
 a separate method to update metadata. And new implementation 
[Consumer#poll(Duration)](https://javadoc.io/static/org.apache.kafka/kafka-clients/3.1.0/org/apache/kafka/clients/consumer/Consumer.html#poll-java.time.Duration-)
 doesn't work with a hack that Drill use: `poll(0)`, due to changed logic: 
https://github.com/apache/kafka/pull/4855 . That is why I had to use a loop 
with a timeout to workaround the absent separate method. 
   ## Documentation
   \-
   
   ## Testing
   Unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org