Metadata Management

2017-10-19 Thread Vasu Gourabathina
All: This may be off topic for Spark, but I'm sure several of you might have used some form of this as part of your BigData implementations. So, wanted to reach out. As part of the Data Lake and Data Processing (by Spark as an example), we might end up different form-factors for the files (via,

Design aspects of Data partitioning for Window functions

2017-08-30 Thread Vasu Gourabathina
All: If this question was already discussed, please let me know. I can try to look into the archive. Data Characteristics: entity_id date fact_1 fact_2 fact_N derived_1 derived_2 derived_X a) There are 1000s of such entities in the system b) Each one has various Fact attributes per

Cluster to Cluster communication

2017-02-08 Thread Vasu Gourabathina
All, This is a theoretical question at this point of time. Wanted to pose this question, before spending too much time to figure it out. Advance apologies if this is not the right forum to ask this question. Use-case: - Migration from one cluster manager to another (for ex. Spark stand-alone to

Design patterns for Spark implementation

2016-12-03 Thread Vasu Gourabathina
Hi, I know this is a broad question. If this is not the right forum, appreciate if you can point to other sites/areas that may be helpful. Before posing this question, I did use our friend Google, but sanitizing the query results from my need angle hasn't been easy. Who I am: - Have done