[ https://issues.apache.org/jira/browse/SPARK-36986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rodrigo Boavida updated SPARK-36986: ------------------------------------ Summary: Improving schema filtering flexibility (was: Improving external schema management flexibility) > Improving schema filtering flexibility > -------------------------------------- > > Key: SPARK-36986 > URL: https://issues.apache.org/jira/browse/SPARK-36986 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: Rodrigo Boavida > Priority: Major > > Our spark usage, requires us to build an external schema and pass it on while > creating a DataSet. > While working through this, I found a couple of optimizations would improve > greatly Spark's flexibility to handle external schema management. > Scope: ability to retrieve a field's name and schema in one single call, > requesting to return a tupple by index. > Means extending the StructType class to support an additional method > This is what the function would look like: > /** > * Returns the index and field structure by name. > * If it doesn't find it, returns None. > * Avoids two client calls/loops to obtain consolidated field info. > * > */ > def getIndexAndFieldByName(name: String): Option[(Int, StructField)] = \{ > val field = nameToField.get(name) if(field.isDefined) \{ > Some((fieldIndex(name), field.get)) } > else > { None } > } > This is particularly useful from an efficiency perspective, when we're > parsing a Json structure and we want to check for every field what is the > name and field type already defined in the schema > I will create a corresponding branch for PR review, assuming that there are > no concerns with the above proposal. > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org