Apache Pinot Daily Email Digest (2021-09-20)

Pinot Slack Email Digest Mon, 20 Sep 2021 19:00:25 -0700

#general

@dadelcas: Hello there, the docs say that a shared volume is required for controllers if more than one is to be deployed. Can someone shed some light as why this is needed instead of each controller having its own storage? Will all the controllers be active? Will they all write to the volume simultaneously? Any considerations we should take into account in a multi-controller environment? My deployment is on k8s
@mayanks: The shared volume is used for storing golden copy of the data as it is pushed. Typically you want to configure a deepstore (eg S3) for this purpose
@dadelcas: Cool, so in fact all the controllers will be active if I understand what you're saying, is this correct?
@mayanks: Yes all controllers will be active to provide fault tolerance. So in case one goes down you will not have unavailability
@dadelcas: Thanks for confirming :+1:
@zineb.raiiss: Hello Freinds, can you help me please, If I turn off the PC everything I did is gone. So If I connect to my machine how can I run ThirdEye? I already install it and go to the first page but now i want to re-run it but i don't know how
@salkadam: @salkadam has joined the channel

#random

@salkadam: @salkadam has joined the channel

#troubleshooting

@bajpai.arpita746462: Hi everyone , I am trying to run the spark ingestion job with apache pinot 0.8.0 in my own cluster setup. I am able to run the standalone job, but when am trying to run spark ingestion job it is giving me following error: java.lang.RuntimeException: Failed to create IngestionJobRunner instance for class - org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner I am using the below command to run the spark job: ${SPARK_HOME}/bin/spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master "local[2]" --deploy-mode client --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-SNAPSHOT-jar-with-dependencies.jar" local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-SNAPSHOT-jar-with-dependencies.jar -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/examples/batch/transcript/transcript_local_jobspec.yaml I am also attaching screenshot of the error and my job spec file for better understanding. Could anyone please help with the same ?
@mayanks: What version of spark? If 3.x, may be try 2.x
@ken: If you’re running Pinot 0.8, then I think this is a known regression (from 0.7.1). See , which has some details of how @kulbir.nijjer worked around this using Spark’s support for `dependencyJarDir`. There’s a PR () to fix this, which works for Hadoop, haven’t tried with Spark.
@zineb.raiiss: Hello Freinds, can you help me please, If I turn off the PC everything I did is gone. So If I connect to my machine how can I run ThirdEye? I already install it and go to the first page but now i want to re-run it but i don't know how.
@mayanks: Hi there is a separate slack workspace for TE which might get you faster response, cc @pyne.suvodeep
@pyne.suvodeep: @zineb.raiiss I think you are using the default h2 db which is in memory. My suggestion would be to install Mysql 5.7 and use thirdeye with it. This doc is a bit out of date but should still help
@zineb.raiiss: @pyne.suvodeep can you add me in workspace for TE? this is my e-mail:
@npawar: sent you invite for TE
@zineb.raiiss: Ooooh I received it, thank you so Much Neha:smiley:
@salkadam: @salkadam has joined the channel
@luisfernandez: if you use star tree index and say you have a time column, if you wanted to do different aggregations based on time, does that mean that the time column also has to be part of the indexes for the star tree? say user_id , click_count, serve_time then I should do user_id, serve_time and then SUM(click_count) as part as the aggregation
@mayanks: In the default config it is already added:
@g.kishore: Short answer - yes time should be per of the index

#pinot-dev

@yuchaoran2011: @yuchaoran2011 has joined the channel

#getting-started

@salkadam: @salkadam has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org