#general


@arun11299: @arun11299 has joined the channel
@arun11299: Hello Folks, Pinot, almost completely is competing on a space with Clickhouse yet there are no much direct comparisons between them which is very surprising for me. I am on a path to design an analytics pipeline + platform and have been on a investigation track on many big data technologies. Initially selected Clickhouse for its install simplicity, scale and performance when tested with our dataset. I am now wondering what Pinot has to offer and would really love to evaluate it as well. If anybody else has been on this journey or has some thoughts about the architectural anf feature differences between these two systems then please do share. Thanks in advance!
  @mayanks: Hi @avasudevan this is a good starting point:
  @g.kishore: most users end up doing the comparison for their specific use cases and I think thats the right thing to do. I can list a few differences but highly encourage you to validate and make your decision • clickhouse is c++ and pinot java.. in today's world, it does not matter much but this does give a slight edge to clickhouse over pinot • clickhouse is simpler architecture than Pinot to get started but once you run in production, clickhouse also needs zookeeper and its actually a lot more complex than Pinot as you scale. *Performance + features* • clickhouse has sparse index where as Pinot has both sparse and row level indexing.. this is super important and Pinot will be faster than clickhouse for any use case that requires slicing and dicing. • Clickhouse full table scan is faster than Pinot. Pinot is highly optimized for random access patterns • Pinot has a ton of indexing techniques - inverted index, geo indexing, range indexing, json indexing and many more. This is the most powerful aspect of Pinot. The architecture if flexible to add indexes at will. • clickhouse has traditional materialized views and pinot has concept which is smart and more effective. Clickhouse also has some partial join support but for join and other cross table queries.

#random


@arun11299: @arun11299 has joined the channel

#troubleshooting


@arun11299: @arun11299 has joined the channel
@qianbo.wang: Hi pinot experts, I tried to upload a segment with the following command: ```pinot-admin.sh LaunchDataIngestionJob -jobSpecFile batch_ingestion.yaml ``` and I can see the output shows it returned 200 and thought it was successful: ```Start pushing segments: ... Response for pushing table enriched_invoices_experiment segment enriched_invoices_experiment_OFFLINE_1595614484_1631217493_0 to location - 200: {"status":"Successfully uploaded segment: enriched_invoices_experiment_OFFLINE_1595614484_1631217493_0 of table: enriched_invoices_experiment"}``` However, I don’t see it showing up by querying the table. Could you please give some pointers where I could look up the issue? I checked the logs and didn’t find errors.
  @mayanks: can you check the external view of the table via swagger?
  @mayanks: If the segment shows there, then can you make the query to the offline table only (use <tableName>_OFFLINE in your query)?
  @qianbo.wang: Do you mean checking the segments using this query? ```curl -X GET "http://<pinot-endpoint>.internal/segments/enriched_invoices_experiment?type=OFFLINE" -H "accept: application/json"```
  @qianbo.wang: hmm, it doesn’t return anything
  @qianbo.wang: oh, you probably meant this one: ```curl -X GET "<tablename>/externalview" -H "accept: application/json"```
  @mayanks: yes
  @qianbo.wang: it returns no segment from this call: ```"OFFLINE":{},"REALTIME":null}```
  @mayanks: How about idealstate?
  @qianbo.wang: the same: ```✗ curl -X GET "" -H "accept: application/json" {"OFFLINE":{},"REALTIME":null}%```
  @qianbo.wang: I tried uploaded again and I see several logging lines from controller like this : ```│ istio-proxy [2021-09-19T21:59:31.725Z] "POST /v2/segments?tableName=enriched_invoices_experiment HTTP/1.1" 200 - via_upstream - "-" 7541946 143 14815 105 "-" "Apache-HttpClient/4.5.9 (Java/1.8.0_222)" "934c7966-4e04-48a1-9219-7de02cab41df" │ │ "10.124.9.8:9000" "10.124.9.8:9000" inbound|9000|| 127.0.0.6:52717 10.124.9.8:9000 10.44.3.130:55565 - default```
  @mayanks: Hmm, your ideal state is empty
  @mayanks: Try the debug endpoint with high verbosity
  @mayanks: May be you don’t have any servers tagged with the tag in table?
  @qianbo.wang: ```✗ curl -X GET "" -H "accept: application/json" {"code":404,"error":"HTTP 404 Not Found"}%``` does it mean the table not exist?
  @qianbo.wang: > May be you don’t have any servers tagged with the tag in table? could you please explain more about this?
  @qianbo.wang: I can see the segments by this request: ```✗ curl -X GET "" -H "accept: application/json" [{"OFFLINE":["enriched_invoices_experiment_OFFLINE_1595614484_1631217493_0","enriched_invoices_experiment_OFFLINE_1614584553_1629482988_0"]}]%```
  @qianbo.wang: feel free to get back tomorrow whenever you are available. it is not urgent..
  @qianbo.wang: Hey, it turns out to be that there were some old segments left my temp folder which were uploaded together for the table. I guess that caused some problems internal, though I don’t see any errors.. Thanks for checking
  @mayanks: So it is working now?
  @mayanks: If so can you describe what happened?
  @qianbo.wang: Yeah, it is working now. I used the same local folder to store segments for different tables. Every time I run the batch ingestion job, it uploaded all the segments to one table which are supposed to be for different tables. I think that caused some issues of showing up the segments for the table. However, I don’t see any error to confirm my guess.
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to