#general
@wcxzjtz: quick question: ```{\"H3IndexFilterOperator Time\":16},{\"DocIdSetOperator Time\":16}``` what does the number mean in the query `traceinfo` ? is it 16ms?
@jackie.jxt: Yes, it is in millis
@jackie.jxt: FYI, `DocIdSetOperator` time includes the `H3IndexFilterOperator` time, it is hierarchical
@wcxzjtz: Oh Got it. Thanks. I just about to ask why these two numbers added up is bigger than the whole query latency showed in UI.
@octchristmas: Hi. Team I am using hdfs as deepstore. I am trying to do ingestion batch with spark on deepstore hdfs cluster. I am having difficulty trying to use another hdfs cluster as input of batch job spec. Is such a deployment configuration possible?
@kharekartik: Hi no currently we support only a single config per filesystem the work for multiple configs per filesystem is WIP
@octchristmas: @kharekartik By single configuration you mean only one filesystem? Or do you mean only one kind of filesystem?
@octchristmas: Let me share my test case. I succeeded in ingestion into deepstore hdfs by reading data from other hdfs in standalone mode with configuration as below. `executionFrameworkSpec:` `name: 'standalone'` `jobType: SegmentCreationAndTarPush` `inputDirURI: '
@octchristmas: I tried in spark mode the same way but it failed. I've tried various things other than this configuration, but it fails every time. `executionFrameworkSpec:` `name: 'spark'` `extraConfigs:` `stagingDir: '
@kharekartik: yep, this won't work currently since we only use one of these configs for ‘hdfs’ scheme. support for multiple configs based on path + scheme is work in progress
@octchristmas: @kharekartik in my tests the deployment was successful in standalone mode using another cluster as inputDir. Is the configuration of this successful test a different case?
@kharekartik: what hdfs cluster did your test pinot use?
@octchristmas: @kharekartik In standalone mode above, cluster 'deepstore' and 'another cluster' are different hdfs clusters. The hdfs cluster is cloudera .
@kharekartik: moving this conversation to DM
@methor1992: @methor1992 has joined the channel
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel
#random
@methor1992: @methor1992 has joined the channel
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel
#feat-compound-types
@slacksync: @slacksync has joined the channel
#feat-text-search
@slacksync: @slacksync has joined the channel
#feat-rt-seg-complete
@slacksync: @slacksync has joined the channel
#feat-presto-connector
@slacksync: @slacksync has joined the channel
#feat-upsert
@slacksync: @slacksync has joined the channel
#pinot-helix
@slacksync: @slacksync has joined the channel
#group-by-refactor
@slacksync: @slacksync has joined the channel
#qps-metric
@slacksync: @slacksync has joined the channel
#order-by
@slacksync: @slacksync has joined the channel
#feat-better-schema-evolution
@slacksync: @slacksync has joined the channel
#fraud
@slacksync: @slacksync has joined the channel
#pinotadls
@slacksync: @slacksync has joined the channel
#inconsistent-segment
@slacksync: @slacksync has joined the channel
#pinot-power-bi
@slacksync: @slacksync has joined the channel
@g.kishore: removed an integration from this channel:
@slacksync: @slacksync has joined the channel
#apa-16824
@slacksync: @slacksync has joined the channel
#pinot-website
@slacksync: @slacksync has joined the channel
#minion-star-tree
@slacksync: @slacksync has joined the channel
#troubleshooting
@deemish2: Hi Team , I am trying to executing backfill job using pinot-ingestion job.Basically , I am trying create offline segment using pinot-inestion job can we fix offline segments size without using minion while executing backfill job if too much data is there? Can anyone please help with the same?
@xiaobing: if you meant such
@deemish2: I am using regex under inculdeFilePattern , so that it takes all the files and generate offline segment . If the record is too much , it will create multiple offline segment. Basically , i want to generate offline segment based on number of record/size using pinot-ingestion job
@xiaobing: hmm.. for pinot-ingestion job, I didn’t see configs to tune output segment size and it’s one segment perf file from the implementation. but I might have looked at the wrong code, so could you share the job spec you used so I could double check. thank you
@methor1992: @methor1992 has joined the channel
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@luisfernandez: hello friends!! we are encountering some issues when migrating data using the job spec, we are basically migrating a bunch of json files in gcs into pinot a json file looks like this: ```{"serve_time":1623110400.00000000,"p_id":8.0476135E7,"u_id":6047599.0,"i_count":1} {"serve_time":1623110400.00000000,"p_id":8.1923416E7,"u_id":5407252.0,"i_count":1,"c_count":1,"c":17}``` we endup having this exception for some of the files: ```2022/05/10 15:48:19.314 ERROR [SegmentGenerationJobRunner] [pool-2-thread-1] Failed to generate Pinot segment for file -
@diogo.baeder: Looks like a JSONL file, not a regular JSON file. If you want a JSON file with multiple rows, you need to put each row as an item of a list, instead of each dict as a line in the file. (E.g. just wrap that whole content with square brackets.)
@diogo.baeder: AFAIK Pinot doesn't support JSONL.
@luisfernandez: hey thank you, we were following this:
@diogo.baeder: Oh... then I don't know, to be honest. What I do know is that that format is not valid JSON - regular JSON parsers won't be able to read that as JSON. That format is JSONL (notice the "L" in the end), where a file has multiple lines and each line contains a valid JSON string.
@diogo.baeder: In my case, in the system I'm developing with Pinot as a database, I'm ingesting from regular JSON files, which always start with a square bracket and ends with it.
@luisfernandez: so what you are saying is that this ```{"serve_time":1623110400.00000000,"p_id":8.0476135E7,"u_id":6047599.0,"i_count":1} {"serve_time":1623110400.00000000,"p_id":8.1923416E7,"u_id":5407252.0,"i_count":1,"c_count":1,"c":17}``` should become this ```[{"serve_time":1623110400.00000000,"p_id":8.0476135E7,"u_id":6047599.0,"i_count":1}, {"serve_time":1623110400.00000000,"p_id":8.1923416E7,"u_id":5407252.0,"i_count":1,"c_count":1,"c":17}]```
@diogo.baeder: Yeah, maybe that makes it work. Just try it, if it works then that was the problem :slightly_smiling_face:
@luisfernandez: the weirdest thing is that we just tried with one of the files that failed we tried that one file in particular with the job spec and it worked :smile:
@diogo.baeder: With JSONL it worked, then?
@luisfernandez: yes :smile:
@luisfernandez: for one file
@luisfernandez: then we shove a bunch of then and then it doesn’t like it
@diogo.baeder: Hmmm... maybe there's an issue with one of the lines then. I noticed that one of your lines has `c_count` and `c`, where the other doesn't. Maybe them being missing is an issue? Did you set a null value for those columns?
@luisfernandez: i didn’t and ialso thought about that
@luisfernandez: but the thing is that when we try to just import that one document everything works lol
@diogo.baeder: Got it. I don't know what the problem is then. What I would do in that case is do a "binary search" to find the offending line - try half of the document first, if it doesn't work then cut it in half, if it works then bring back some lines, so on and so on, until I find the problematic line.
@luisfernandez: we will try another data format for now
@diogo.baeder: Cool
@luisfernandez: these files are generated by spark into this json format
@luisfernandez: then we have in gcs
@diogo.baeder: Got it
@luisfernandez: `year/month/day/partfiles.json` and this is what we want to eventually put into pinot, and those are 2 years worth of data
@diogo.baeder: Got it, sounds good
@luisfernandez: this job yaml has been not super straight forward to get right lol
@luisfernandez: do you know who else may have some experience with it?
@diogo.baeder: I'm developing a system that was using the regular batch ingestion flow, but now I'm manually ingesting segment data - which also fills my offline table, but through a bit of a different process. The previous process used to work for me.
@luisfernandez: like ingestFromURI?
@diogo.baeder: What I'm using now? Yes.
@diogo.baeder: The previous flow was just the regular batch ingestion, with a job YAML config file, which I triggered via the Pinot admin CLI.
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel
#pinot-s3
@slacksync: @slacksync has joined the channel
#pinot-k8s-operator
@slacksync: @slacksync has joined the channel
#onboarding
@slacksync: @slacksync has joined the channel
#feat-geo-spatial-index
@slacksync: @slacksync has joined the channel
#transform-functions
@slacksync: @slacksync has joined the channel
#custom-aggregators
@slacksync: @slacksync has joined the channel
#inconsistent-perf
@slacksync: @slacksync has joined the channel
#docs
@slacksync: @slacksync has joined the channel
#aggregators
@slacksync: @slacksync has joined the channel
#query-latency
@slacksync: @slacksync has joined the channel
#dhill-date-seg
@slacksync: @slacksync has joined the channel
#enable-generic-offsets
@slacksync: @slacksync has joined the channel
#pinot-dev
@slacksync: @slacksync has joined the channel
#community
@slacksync: @slacksync has joined the channel
#feat-pravega-connector
@slacksync: @slacksync has joined the channel
#announcements
@slacksync: @slacksync has joined the channel
#s3-multiple-buckets
@slacksync: @slacksync has joined the channel
#release-certifier
@slacksync: @slacksync has joined the channel
#multiple_streams
@slacksync: @slacksync has joined the channel
#lp-pinot-poc
@slacksync: @slacksync has joined the channel
#roadmap
@slacksync: @slacksync has joined the channel
#presto-pinot-connector
@slacksync: @slacksync has joined the channel
#multi-region-setup
@slacksync: @slacksync has joined the channel
#metadata-push-api
@slacksync: @slacksync has joined the channel
#pql-sql-regression
@slacksync: @slacksync has joined the channel
#pinot-realtime-table-rebalance
@slacksync: @slacksync has joined the channel
#release060
@slacksync: @slacksync has joined the channel
#time-based-segment-pruner
@slacksync: @slacksync has joined the channel
#discuss-validation
@slacksync: @slacksync has joined the channel
#segment-cold-storage
@slacksync: @slacksync has joined the channel
#new-office-space
@slacksync: @slacksync has joined the channel
#config-tuner
@slacksync: @slacksync has joined the channel
#test-channel
@slacksync: @slacksync has joined the channel
#pinot-perf-tuning
@slacksync: @slacksync has joined the channel
#thirdeye-pinot
@slacksync: @slacksync has joined the channel
#getting-started
@methor1992: @methor1992 has joined the channel
@vanduc.dn: Hi all, Im evaluating Pinot for realtime analytics for our feature on mobile app. The total record is about 50-100K transactions/day. Is it good to adopt Pinot, Im afraid of too much engineering on it.
@g.kishore: While we would love for you to use Pinot, the Data size (100k) is too small to need something like Pinot. An rdbms like postgres or MySQL should be a good start.
@mayanks: 100k is per day or total? Also what is the total retention
@vanduc.dn: 100K is per day, Total now is 30 million records.
@vanduc.dn: May you advise when it would be good point to use Pinot? Beside main topic `txn` , we also have other smaller size topics which might be used for joining with txn ones to show the aggregation report.
@vanduc.dn: We plan to head up to 200K txns/day once users growth
@mayanks: 30M rows is still small, but definitely better than 100k in terms of RoI for using Pinot
@mayanks: Especially if you think it will grow soon
@mayanks: If you have real-time, and also want to expose analytics to end users, then you can definitely consider Pinot even for this scale
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel
#feat-partial-upsert
@slacksync: @slacksync has joined the channel
#pinot_website_improvement_suggestions
@slacksync: @slacksync has joined the channel
#segment-write-api
@slacksync: @slacksync has joined the channel
#releases
@slacksync: @slacksync has joined the channel
#metrics-plugin-impl
@slacksync: @slacksync has joined the channel
#debug_upsert
@slacksync: @slacksync has joined the channel
#flink-pinot-connector
@slacksync: @slacksync has joined the channel
#pinot-rack-awareness
@slacksync: @slacksync has joined the channel
#minion-improvements
@slacksync: @slacksync has joined the channel
#fix-numerical-predicate
@slacksync: @slacksync has joined the channel
#complex-type-support
@slacksync: @slacksync has joined the channel
#fix_llc_segment_upload
@slacksync: @slacksync has joined the channel
#product-launch
@slacksync: @slacksync has joined the channel
#pinot-docsrus
@slacksync: @slacksync has joined the channel
#pinot-trino
@slacksync: @slacksync has joined the channel
#kinesis_help
@slacksync: @slacksync has joined the channel
#udf-type-matching
@slacksync: @slacksync has joined the channel
#jobs
@vinichhajed: @vinichhajed has joined the channel
@slacksync: @slacksync has joined the channel
#introductions
@methor1992: @methor1992 has joined the channel
@sukru.haciyanli: @sukru.haciyanli has joined the channel
@vinichhajed: @vinichhajed has joined the channel
@kam: @kam has joined the channel
@slacksync: @slacksync has joined the channel
#linen_dev
@g.kishore: added an integration to this channel:
@pinot-bot: @pinot-bot has joined the channel
@slacksync: @slacksync has joined the channel
@pinot-bot: @pinot-bot has left the channel
@kam: @kam has joined the channel
@kam: @xiangfu0 Pretty sure the issue is the removal on May 6th since it stopped syncing then
@xiangfu0: added an integration to this channel:
@xiangfu0: added an integration to this channel:
@xiangfu0: yes, the pro subscription stopped on that day
@kam: Ahhhh
@xiangfu0: I delete one app and reinstalled linen
@xiangfu0: However that seems doesn’t fix
@kam: Hmm still doesn’t seem to be working i’ll dig a little more
@xiangfu0: got it
@xiangfu0: yeah
@xiangfu0: let’s wait for a while
@xiangfu0: Thanks for your help!
@kam: No problem!
@kam: @xiangfu0 btw if you want google to start finding the conversations you should link to apache-pinot’s linen page from either github or your landing page
@xiangfu0: you mean the website ?
@xiangfu0: like
@xiangfu0: I will add that !
@xiangfu0: I saw this
@xiangfu0: shall I click this `Add to Slack` button and retry?
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org