#general


@gulshan.yadav: @gulshan.yadav has joined the channel
@daniel: @here was wondering whether you might be able to point out to some rule-of-thumb compression benchmarks using Pinot? Cheers
  @daniel: I'm like catching Snappy, runlength and so, but do you care about this, or are they like under-the-hood defaults?
  @g.kishore: it depends on the input data format • it its row format like csv, json, avro, proto you can see anywhere between 3x to 10x compression • If its columnar like orc/parquet then its you dont see a lot of compression - 0.9x to 1.1x
@ranabanerji: @ranabanerji has joined the channel
@kha.nguyen: @kha.nguyen has joined the channel
@mayanks: Congratulations to the Data Sketches team (@leerho) on graduating to Apache Top level project. Glad to mention that Apache Pinot already provides support for Theta-Sketches based count-distinct (and set _expression_ evaluations):
@murat.ozcan: @murat.ozcan has joined the channel
@hansospina: @hansospina has joined the channel
@karinwolok1: :loudspeaker: Event updates! :loudspeaker: In case you missed this last time, @npawar (Pinot PMC & Committer) and Tim Berglund (Kafka / Confluent) will be presenting tomorrow: Also, we have a talk in 2 weeks on Advanced Pinot Features: Upsert and JSON Indexing with @yupeng and @jackie.jxt
@terrysv: @terrysv has joined the channel
@huangzhenqiu0825: @huangzhenqiu0825 has joined the channel
@tymm:

#random


@gulshan.yadav: @gulshan.yadav has joined the channel
@ranabanerji: @ranabanerji has joined the channel
@kha.nguyen: @kha.nguyen has joined the channel
@murat.ozcan: @murat.ozcan has joined the channel
@hansospina: @hansospina has joined the channel
@terrysv: @terrysv has joined the channel
@huangzhenqiu0825: @huangzhenqiu0825 has joined the channel

#troubleshooting


@gulshan.yadav: @gulshan.yadav has joined the channel
@ranabanerji: @ranabanerji has joined the channel
@kha.nguyen: @kha.nguyen has joined the channel
@kha.nguyen: Hi everyone. I'm currently trying to import some batch data to my Pinot cluster and I'm running into some issues with doing this. I have the latest version of Pinot (0.7.0) in a docker container, and I set everything up manually. I followed the docker version of this guide here: . I am able to configure the `baseballStats` offline table with some modifications to the files. When I am uploading my own batch data, I get the following error: ```400 (Bad Request) with reason: "Cannot add invalid schema: rows_10m. Reason: null"``` I currently have a CSV that's formatted like this ```# /DIRECTORIES/rawdata/rows_10m.csv id, hash_one, text_one 0, (large integer), a 1, (large integer), b ...``` A schema.json that has this ```# /DIRECTORIES/rows_10m_schema.json { "schemaName": "rows_10m", "dimensionFieldSpecs": [ { "datatype": "STRING", "name": "text_one" } ], "metricFieldSpecs": [ { "datatype": "INT", "name": "id" }, { "datatype": "INT", "name": "hash_one" } ] }``` and a table config that has this ```# /DIRECTORIES/rows_10m_offline_table_config.json { "tableName": "rows_10m", "tableTypes": "OFFLINE", "segmentsConfig": { "segmentPushType": "APPEND", "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy", "schemaName": "rows_10m", "replication": "1" }, "tenants": {}, "tableIndexConfig": { "loadMode": "HEAP", "invertedIndexColumns": [ "id", "hash_one" ] }, "metadata": { "customConfigs": { } } }``` This is very similar to what I used when I manually added the default `baseballStats` . Am I missing anything in my schema.json file?
  @wrbriggs: @kha.nguyen With the `APPEND` push type, even with an offline table, I am pretty sure a primary time column is mandatory. Your schema doesn’t define one.
  @wrbriggs: and your table definition doesn’t contain a `timeColumnName`value either - however, the example you’re running is likely trying to push the schema first, and that’s where it’s failing - so you’re not even getting to the table creation or loading the batch CSV
  @wrbriggs: According to the docs: ```The primary time column is used by Pinot, for maintaining the time boundary between offline and realtime data in a hybrid table and for retention management. A primary time column is mandatory if the table's push type is APPEND and optional if the push type is REFRESH.``` (see `DateTime` )
@mailtobuchi: @mailtobuchi has left the channel
@murat.ozcan: @murat.ozcan has joined the channel
@hansospina: @hansospina has joined the channel
@terrysv: @terrysv has joined the channel
@huangzhenqiu0825: @huangzhenqiu0825 has joined the channel

#pinot-s3


@hansospina: @hansospina has joined the channel

#onboarding


@hansospina: @hansospina has joined the channel

#pinot-dev


@hansospina: @hansospina has joined the channel
@jlli: Hey @slack1 we recently found a bug in this PR (), and I opened a hotfix for that (). Could you review it?
  @slack1: Hey Jack, thanks for the PR. Quick question - did you check with the most recent master? we just merged a related PR 2 days ago:
  @jlli: Yes, I noticed that. But this time it’s on pinot-server. The PR you pointed out is for pinot-controller and pinot-broker.
  @slack1: I could have done a better job with the PR description. we actually fixed the default behavior for server admin port too (see bottom change to ListenerConfigUtil.java)
  @jlli: Cool, I’ve verified that the latest fix works. Thanks for pointing that out! I can close the current PR now.

#community


@terrysv: @terrysv has joined the channel

#announcements


@terrysv: @terrysv has joined the channel

#discuss-validation


@chinmay.cerebro: FYI: Opened a new PR for some missing validation. @mayanks might want to eyeball it - I dont think this will cause any issues in the LinkedIn integration tests - but doesn't hurt to verify
@chinmay.cerebro:
@chinmay.cerebro: lol - it just broke a bunch of integration tests :slightly_smiling_face:. Looks like some integration tests are specifying a range index on a non numeric column

#getting-started


@hansospina: @hansospina has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to