#general


@wcxzjtz: quick question: ```{\"H3IndexFilterOperator Time\":16},{\"DocIdSetOperator Time\":16}``` what does the number mean in the query `traceinfo` ? is it 16ms?
  @jackie.jxt: Yes, it is in millis
  @jackie.jxt: FYI, `DocIdSetOperator` time includes the `H3IndexFilterOperator` time, it is hierarchical
  @wcxzjtz: Oh Got it. Thanks. I just about to ask why these two numbers added up is bigger than the whole query latency showed in UI.
@octchristmas: Hi. Team I am using hdfs as deepstore. I am trying to do ingestion batch with spark on deepstore hdfs cluster. I am having difficulty trying to use another hdfs cluster as input of batch job spec. Is such a deployment configuration possible?
  @kharekartik: Hi no currently we support only a single config per filesystem the work for multiple configs per filesystem is WIP
  @octchristmas: @kharekartik By single configuration you mean only one filesystem? Or do you mean only one kind of filesystem?
  @octchristmas: Let me share my test case. I succeeded in ingestion into deepstore hdfs by reading data from other hdfs in standalone mode with configuration as below. `executionFrameworkSpec:` `name: 'standalone'` `jobType: SegmentCreationAndTarPush` `inputDirURI: ''` `pinotFSSpecs:` `- scheme: hdfs` `className: org.apache.pinot.plugin.filesystem.HadoopPinotFS` `configs:` `hadoop.conf.path: '[deepstore hdfs hadoop config path]'` `hadoop.kerberos.principle: '[kerberos principal]'` `hadoop.kerberos.keytab: '[local filesystem keytab path]'` `- scheme: hdfs` `className: org.apache.pinot.plugin.filesystem.HadoopPinotFS` `configs:` `hadoop.conf.path: '[another cluster hdfs hadoop config path]'` `hadoop.kerberos.principle: '[kerberos principal]'` `hadoop.kerberos.keytab: '[local filesystem keytab path]'`
  @octchristmas: I tried in spark mode the same way but it failed. I've tried various things other than this configuration, but it fails every time. `executionFrameworkSpec:` `name: 'spark'` `extraConfigs:` `stagingDir: ''` `inputDirURI: ''` `outputDirURI: ''` `pinotFSSpecs:` `- scheme: hdfs` `className: org.apache.pinot.plugin.filesystem.HadoopPinotFS` `configs:` `hadoop.kerberos.principle: '[kerberos principal]'` `- scheme: hdfs` `className: org.apache.pinot.plugin.filesystem.HadoopPinotFS` `configs:` `hadoop.conf.path: ''` `hadoop.kerberos.principle: '[kerberos principal]'`
  @kharekartik: yep, this won't work currently since we only use one of these configs for ‘hdfs’ scheme. support for multiple configs based on path + scheme is work in progress
  @octchristmas: @kharekartik in my tests the deployment was successful in standalone mode using another cluster as inputDir. Is the configuration of this successful test a different case?
  @kharekartik: what hdfs cluster did your test pinot use?
  @octchristmas: @kharekartik In standalone mode above, cluster 'deepstore' and 'another cluster' are different hdfs clusters. The hdfs cluster is cloudera .
  @kharekartik: moving this conversation to DM
@methor1992: @methor1992 has joined the channel
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel

#random


@methor1992: @methor1992 has joined the channel
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel

#feat-compound-types


@slacksync: @slacksync has joined the channel

#feat-text-search


@slacksync: @slacksync has joined the channel

#feat-rt-seg-complete


@slacksync: @slacksync has joined the channel

#feat-presto-connector


@slacksync: @slacksync has joined the channel

#feat-upsert


@slacksync: @slacksync has joined the channel

#pinot-helix


@slacksync: @slacksync has joined the channel

#group-by-refactor


@slacksync: @slacksync has joined the channel

#qps-metric


@slacksync: @slacksync has joined the channel

#order-by


@slacksync: @slacksync has joined the channel

#feat-better-schema-evolution


@slacksync: @slacksync has joined the channel

#fraud


@slacksync: @slacksync has joined the channel

#pinotadls


@slacksync: @slacksync has joined the channel

#inconsistent-segment


@slacksync: @slacksync has joined the channel

#pinot-power-bi


@slacksync: @slacksync has joined the channel

#twitter


@g.kishore: removed an integration from this channel:
@slacksync: @slacksync has joined the channel

#apa-16824


@slacksync: @slacksync has joined the channel

#pinot-website


@slacksync: @slacksync has joined the channel

#minion-star-tree


@slacksync: @slacksync has joined the channel

#troubleshooting


@deemish2: Hi Team , I am trying to executing backfill job using pinot-ingestion job.Basically , I am trying create offline segment using pinot-inestion job can we fix offline segments size without using minion while executing backfill job if too much data is there? Can anyone please help with the same?
  @xiaobing: if you meant such , looks like it has to ingest one file into one segment but you might spread the input data into multiple files as a workaround, if that’s feasible
  @deemish2: I am using regex under inculdeFilePattern , so that it takes all the files and generate offline segment . If the record is too much , it will create multiple offline segment. Basically , i want to generate offline segment based on number of record/size using pinot-ingestion job
  @xiaobing: hmm.. for pinot-ingestion job, I didn’t see configs to tune output segment size and it’s one segment perf file from the implementation. but I might have looked at the wrong code, so could you share the job spec you used so I could double check. thank you
@methor1992: @methor1992 has joined the channel
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@luisfernandez: hello friends!! we are encountering some issues when migrating data using the job spec, we are basically migrating a bunch of json files in gcs into pinot a json file looks like this: ```{"serve_time":1623110400.00000000,"p_id":8.0476135E7,"u_id":6047599.0,"i_count":1} {"serve_time":1623110400.00000000,"p_id":8.1923416E7,"u_id":5407252.0,"i_count":1,"c_count":1,"c":17}``` we endup having this exception for some of the files: ```2022/05/10 15:48:19.314 ERROR [SegmentGenerationJobRunner] [pool-2-thread-1] Failed to generate Pinot segment for file - lau_tmp/raw_data/date=2020-07-25/part-00168-c741f867-338d-4c84-afaf-428f85c14088.c000.json java.lang.RuntimeException: Unexpected end-of-input within/between Object entries``` do you know why we may end up getting these errors?
  @diogo.baeder: Looks like a JSONL file, not a regular JSON file. If you want a JSON file with multiple rows, you need to put each row as an item of a list, instead of each dict as a line in the file. (E.g. just wrap that whole content with square brackets.)
  @diogo.baeder: AFAIK Pinot doesn't support JSONL.
  @luisfernandez: hey thank you, we were following this: , and it seems it should be ok?
  @diogo.baeder: Oh... then I don't know, to be honest. What I do know is that that format is not valid JSON - regular JSON parsers won't be able to read that as JSON. That format is JSONL (notice the "L" in the end), where a file has multiple lines and each line contains a valid JSON string.
  @diogo.baeder: In my case, in the system I'm developing with Pinot as a database, I'm ingesting from regular JSON files, which always start with a square bracket and ends with it.
  @luisfernandez: so what you are saying is that this ```{"serve_time":1623110400.00000000,"p_id":8.0476135E7,"u_id":6047599.0,"i_count":1} {"serve_time":1623110400.00000000,"p_id":8.1923416E7,"u_id":5407252.0,"i_count":1,"c_count":1,"c":17}``` should become this ```[{"serve_time":1623110400.00000000,"p_id":8.0476135E7,"u_id":6047599.0,"i_count":1}, {"serve_time":1623110400.00000000,"p_id":8.1923416E7,"u_id":5407252.0,"i_count":1,"c_count":1,"c":17}]```
  @diogo.baeder: Yeah, maybe that makes it work. Just try it, if it works then that was the problem :slightly_smiling_face:
  @luisfernandez: the weirdest thing is that we just tried with one of the files that failed we tried that one file in particular with the job spec and it worked :smile:
  @diogo.baeder: With JSONL it worked, then?
  @luisfernandez: yes :smile:
  @luisfernandez: for one file
  @luisfernandez: then we shove a bunch of then and then it doesn’t like it
  @diogo.baeder: Hmmm... maybe there's an issue with one of the lines then. I noticed that one of your lines has `c_count` and `c`, where the other doesn't. Maybe them being missing is an issue? Did you set a null value for those columns?
  @luisfernandez: i didn’t and ialso thought about that
  @luisfernandez: but the thing is that when we try to just import that one document everything works lol
  @diogo.baeder: Got it. I don't know what the problem is then. What I would do in that case is do a "binary search" to find the offending line - try half of the document first, if it doesn't work then cut it in half, if it works then bring back some lines, so on and so on, until I find the problematic line.
  @luisfernandez: we will try another data format for now
  @diogo.baeder: Cool
  @luisfernandez: these files are generated by spark into this json format
  @luisfernandez: then we have in gcs
  @diogo.baeder: Got it
  @luisfernandez: `year/month/day/partfiles.json` and this is what we want to eventually put into pinot, and those are 2 years worth of data
  @diogo.baeder: Got it, sounds good
  @luisfernandez: this job yaml has been not super straight forward to get right lol
  @luisfernandez: do you know who else may have some experience with it?
  @diogo.baeder: I'm developing a system that was using the regular batch ingestion flow, but now I'm manually ingesting segment data - which also fills my offline table, but through a bit of a different process. The previous process used to work for me.
  @luisfernandez: like ingestFromURI?
  @diogo.baeder: What I'm using now? Yes.
  @diogo.baeder: The previous flow was just the regular batch ingestion, with a job YAML config file, which I triggered via the Pinot admin CLI.
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel

#pinot-s3


@slacksync: @slacksync has joined the channel

#pinot-k8s-operator


@slacksync: @slacksync has joined the channel

#onboarding


@slacksync: @slacksync has joined the channel

#feat-geo-spatial-index


@slacksync: @slacksync has joined the channel

#transform-functions


@slacksync: @slacksync has joined the channel

#custom-aggregators


@slacksync: @slacksync has joined the channel

#inconsistent-perf


@slacksync: @slacksync has joined the channel

#docs


@slacksync: @slacksync has joined the channel

#aggregators


@slacksync: @slacksync has joined the channel

#query-latency


@slacksync: @slacksync has joined the channel

#dhill-date-seg


@slacksync: @slacksync has joined the channel

#enable-generic-offsets


@slacksync: @slacksync has joined the channel

#pinot-dev


@slacksync: @slacksync has joined the channel

#community


@slacksync: @slacksync has joined the channel

#feat-pravega-connector


@slacksync: @slacksync has joined the channel

#announcements


@slacksync: @slacksync has joined the channel

#s3-multiple-buckets


@slacksync: @slacksync has joined the channel

#release-certifier


@slacksync: @slacksync has joined the channel

#multiple_streams


@slacksync: @slacksync has joined the channel

#lp-pinot-poc


@slacksync: @slacksync has joined the channel

#roadmap


@slacksync: @slacksync has joined the channel

#presto-pinot-connector


@slacksync: @slacksync has joined the channel

#multi-region-setup


@slacksync: @slacksync has joined the channel

#metadata-push-api


@slacksync: @slacksync has joined the channel

#pql-sql-regression


@slacksync: @slacksync has joined the channel

#pinot-realtime-table-rebalance


@slacksync: @slacksync has joined the channel

#release060


@slacksync: @slacksync has joined the channel

#time-based-segment-pruner


@slacksync: @slacksync has joined the channel

#discuss-validation


@slacksync: @slacksync has joined the channel

#segment-cold-storage


@slacksync: @slacksync has joined the channel

#new-office-space


@slacksync: @slacksync has joined the channel

#config-tuner


@slacksync: @slacksync has joined the channel

#test-channel


@slacksync: @slacksync has joined the channel

#pinot-perf-tuning


@slacksync: @slacksync has joined the channel

#thirdeye-pinot


@slacksync: @slacksync has joined the channel

#getting-started


@methor1992: @methor1992 has joined the channel
@vanduc.dn: Hi all, Im evaluating Pinot for realtime analytics for our feature on mobile app. The total record is about 50-100K transactions/day. Is it good to adopt Pinot, Im afraid of too much engineering on it.
  @g.kishore: While we would love for you to use Pinot, the Data size (100k) is too small to need something like Pinot. An rdbms like postgres or MySQL should be a good start.
  @mayanks: 100k is per day or total? Also what is the total retention
  @vanduc.dn: 100K is per day, Total now is 30 million records.
  @vanduc.dn: May you advise when it would be good point to use Pinot? Beside main topic `txn` , we also have other smaller size topics which might be used for joining with txn ones to show the aggregation report.
  @vanduc.dn: We plan to head up to 200K txns/day once users growth
  @mayanks: 30M rows is still small, but definitely better than 100k in terms of RoI for using Pinot
  @mayanks: Especially if you think it will grow soon
  @mayanks: If you have real-time, and also want to expose analytics to end users, then you can definitely consider Pinot even for this scale
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel

#feat-partial-upsert


@slacksync: @slacksync has joined the channel

#pinot_website_improvement_suggestions


@slacksync: @slacksync has joined the channel

#segment-write-api


@slacksync: @slacksync has joined the channel

#releases


@slacksync: @slacksync has joined the channel

#metrics-plugin-impl


@slacksync: @slacksync has joined the channel

#debug_upsert


@slacksync: @slacksync has joined the channel

#flink-pinot-connector


@slacksync: @slacksync has joined the channel

#pinot-rack-awareness


@slacksync: @slacksync has joined the channel

#minion-improvements


@slacksync: @slacksync has joined the channel

#fix-numerical-predicate


@slacksync: @slacksync has joined the channel

#complex-type-support


@slacksync: @slacksync has joined the channel

#fix_llc_segment_upload


@slacksync: @slacksync has joined the channel

#product-launch


@slacksync: @slacksync has joined the channel

#pinot-docsrus


@slacksync: @slacksync has joined the channel

#pinot-trino


@slacksync: @slacksync has joined the channel

#kinesis_help


@slacksync: @slacksync has joined the channel

#udf-type-matching


@slacksync: @slacksync has joined the channel

#jobs


@vinichhajed: @vinichhajed has joined the channel
@slacksync: @slacksync has joined the channel

#introductions


@methor1992: @methor1992 has joined the channel
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel

#linen_dev


@g.kishore: added an integration to this channel:
@pinot-bot: @pinot-bot has joined the channel
@slacksync: @slacksync has joined the channel
@pinot-bot: @pinot-bot has left the channel
@kam: @kam has joined the channel
@kam: @xiangfu0 Pretty sure the issue is the removal on May 6th since it stopped syncing then
@xiangfu0: added an integration to this channel:
@xiangfu0: added an integration to this channel:
@xiangfu0: yes, the pro subscription stopped on that day
@kam: Ahhhh
@xiangfu0: I delete one app and reinstalled linen
@xiangfu0: However that seems doesn’t fix
@kam: Hmm still doesn’t seem to be working i’ll dig a little more
@xiangfu0: got it
@xiangfu0: yeah
@xiangfu0: let’s wait for a while
@xiangfu0: Thanks for your help!
@kam: No problem!
@kam: @xiangfu0 btw if you want google to start finding the conversations you should link to apache-pinot’s linen page from either github or your landing page
@xiangfu0: you mean the website ?
@xiangfu0: like or ?
@xiangfu0: I will add that !
@xiangfu0: I saw this
@xiangfu0: shall I click this `Add to Slack` button and retry?
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Reply via email to