[ https://issues.apache.org/jira/browse/SPARK-16203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551966#comment-16551966 ]
Herman van Hovell commented on SPARK-16203: ------------------------------------------- [~nnicolini] adding {{regexp_extract_all}} makes sense. Can you file a new ticket for this? BTW there might already one. > regexp_extract to return an ArrayType(StringType()) > --------------------------------------------------- > > Key: SPARK-16203 > URL: https://issues.apache.org/jira/browse/SPARK-16203 > Project: Spark > Issue Type: Improvement > Affects Versions: 2.0.0 > Reporter: Max Moroz > Priority: Minor > > regexp_extract only returns a single matched group. If (as if often the case > - e.g., web log parsing) we need to parse the entire line and get all the > groups, we'll need to call it as many times as there are groups. > It's only a minor annoyance syntactically. > But unless I misunderstand something, it would be very inefficient. (How > would Spark know not to do multiple pattern matching operations, when only > one is needed? Or does the optimizer actually check whether the patterns are > identical, and if they are, avoid the repeated regex matching operations??) > Would it be possible to have it return an array when the index is not > specified (defaulting to None)? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org