baganokodo2022 commented on PR #38922: URL: https://github.com/apache/spark/pull/38922#issuecomment-1341534552
Hi @SandishKumarHN, For the `recursionDepth` option, could we consider naming it as `CircularReferenceTolerance` or `CircularReferenceDepth` for clarity? For instance, -1 (default value) will error out on any circular reference, 0 drops any circular reference field, 1 allows the same field to be entered twice, and on. Besides, can we also support a "CircularReferenceType" option with a enum value of `[FIELD_NAME, FIELD_TYPE]`. The reason is because navigation can go very deep before the same **fully-qualified** `FIELD_NAME` is encountered again. While `FIELD_TYPE` stops recursive navigation much faster. We could make `FIELD_NAME` the default option. In my test cases, with `FIELD_TYPE`, a circular reference can repeat 3 times before the executor hit OOM, while `FIELD_NAME` hit OOM when `CircularReferenceTolerance` is set to 1. Please let me know your thoughts. cc @rangadi Thank you Xinyu Liu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org