#general


@siavash: @siavash has joined the channel
@siavash: Hi Everyone! :wave: I’m co-founder of an online community platform startup. We help companies create customizable white-labeled social networks to connect their audience together. Apache Pinot looks amazing and we want to use Apache Pinot mostly for our analytics and user segmentation/filtering. Using Pinot for analytics is a no-brainer. However, I’m not sure if we should ElasticSearch or Apache Pinot for user filtering. To give you more context, in our platform users can take different actions such as “Creating a post”, “Liking a post”, “Commenting on a post”, “Buying an item”, etc. and they have different properties such as “Title”, “Age”, “Last Seen At”, etc. An example of user filtering is to fetch all users who have more than 5 posts and 10 comments and their age is more than 21 and were seen in the last 10 days. We should be able to sort the results on different columns of the user and paginate the results. Now here are my two questions: 1. If we want to use Pinot for user filtering, we will need to set the data retention period to infinite since the filters can be applied to any time period including from the beginning. Does Pinot slow down based on the amount of data it stores over time? Should we think of running cron jobs every month for instance to convert all the very old records to one or there is no need for it? 2. If we want to do filters on the number of actions (Buying an item), action fields (The amount of the item that was bought) and user fields (there can be custom fields defined). This means each record that we want to insert will have many columns. For instance for the “Buying an item” example, we need to save all the properties of the buyer, the product, the price. For other actions, we will need to save other properties. This means the number of columns can end up to hundreds. Is Apache Pinot designed to handle tons of columns in the schema? Thanks in advance for the help!
  @g.kishore: Thanks @siavash for the interest in Pinot.
  @g.kishore: If your need is to only keep the aggregates, its more economical to do periodic aggregations. There is a framework in Pinot (Minion) that helps with this.
  @g.kishore: Hundreds of columns will not be a problem. The metadata will be bigger for a segment but thats about it.
  @g.kishore: we do store the list of columns in memory (vs data which is mmapped), so if you have lot of segments and lot of columns the memory required might need to increase. You can think of 100kb memory requirement per segment.
  @g.kishore: So it wont be a lot
  @siavash: @g.kishore Thanks for making such a great product. The best case would be to have the ability to create queries for any time period. But if that would slow things down, we’d be able to aggregate old content and limit users to filtering up to “1 year ago” OR “All time”. Just wanted to make sure having hundreds of columns will not slow down the queries especially when we do group by. 100kb for each segment feels very reasonable. So it seems overall there won’t be any issues. Thanks!
  @g.kishore: if you need to keep individual records but also need speed, you can use star tree indexing
  @g.kishore:
  @siavash: 1 more question, let’s say in a social feed people can create posts, comment on posts and react to posts similar to LinkedIn. Now, we want to show on user profile how many of each activity they’ve done. In the traditional approach we would store counts for every single activity in the user record. For instance `postsCount`, `likesCount`, `commentsCount`, etc. Based on the fact that aggregations are super fast on Pinot. Does it make sense not to store these on user’s records anymore? Or it’s still better to store it on user’s records as a cache and update it whenever we hit Apache Pinot?
  @g.kishore: With periodic aggregation, it’s ok to serve this directly from Pinot.

#random


@siavash: @siavash has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to