Apache Pinot Daily Email Digest (2022-05-10)

Pinot Slack Email Digest Tue, 10 May 2022 19:00:52 -0700

#general

@wcxzjtz: quick question: ```{\"H3IndexFilterOperator Time\":16},{\"DocIdSetOperator Time\":16}``` what does the number mean in the query `traceinfo` ? is it 16ms?
@jackie.jxt: Yes, it is in millis
@jackie.jxt: FYI, `DocIdSetOperator` time includes the `H3IndexFilterOperator` time, it is hierarchical
@wcxzjtz: Oh Got it. Thanks. I just about to ask why these two numbers added up is bigger than the whole query latency showed in UI.
@octchristmas: Hi. Team I am using hdfs as deepstore. I am trying to do ingestion batch with spark on deepstore hdfs cluster. I am having difficulty trying to use another hdfs cluster as input of batch job spec. Is such a deployment configuration possible?
@kharekartik: Hi no currently we support only a single config per filesystem the work for multiple configs per filesystem is WIP
@octchristmas: @kharekartik By single configuration you mean only one filesystem? Or do you mean only one kind of filesystem?
@octchristmas: Let me share my test case. I succeeded in ingestion into deepstore hdfs by reading data from other hdfs in standalone mode with configuration as below. `executionFrameworkSpec:` `name: 'standalone'` `jobType: SegmentCreationAndTarPush` `inputDirURI: ''` `pinotFSSpecs:` `- scheme: hdfs` `className: org.apache.pinot.plugin.filesystem.HadoopPinotFS` `configs:` `hadoop.conf.path: '[deepstore hdfs hadoop config path]'` `hadoop.kerberos.principle: '[kerberos principal]'` `hadoop.kerberos.keytab: '[local filesystem keytab path]'` `- scheme: hdfs` `className: org.apache.pinot.plugin.filesystem.HadoopPinotFS` `configs:` `hadoop.conf.path: '[another cluster hdfs hadoop config path]'` `hadoop.kerberos.principle: '[kerberos principal]'` `hadoop.kerberos.keytab: '[local filesystem keytab path]'`
@octchristmas: I tried in spark mode the same way but it failed. I've tried various things other than this configuration, but it fails every time. `executionFrameworkSpec:` `name: 'spark'` `extraConfigs:` `stagingDir: ''` `inputDirURI: ''` `outputDirURI: ''` `pinotFSSpecs:` `- scheme: hdfs` `className: org.apache.pinot.plugin.filesystem.HadoopPinotFS` `configs:` `hadoop.kerberos.principle: '[kerberos principal]'` `- scheme: hdfs` `className: org.apache.pinot.plugin.filesystem.HadoopPinotFS` `configs:` `hadoop.conf.path: ''` `hadoop.kerberos.principle: '[kerberos principal]'`
@kharekartik: yep, this won't work currently since we only use one of these configs for ‘hdfs’ scheme. support for multiple configs based on path + scheme is work in progress
@octchristmas: @kharekartik in my tests the deployment was successful in standalone mode using another cluster as inputDir. Is the configuration of this successful test a different case?
@kharekartik: what hdfs cluster did your test pinot use?
@octchristmas: @kharekartik In standalone mode above, cluster 'deepstore' and 'another cluster' are different hdfs clusters. The hdfs cluster is cloudera .
@kharekartik: moving this conversation to DM
@methor1992: @methor1992 has joined the channel
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel

#random

@methor1992: @methor1992 has joined the channel
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel