Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190485347 --- Diff: python/pyspark/sql/tests.py --- @@ -3040,6 +3040,50 @@ def test_csv_sampling_ratio(self): .csv(rdd, samplingRatio=0.5).schema self.assertEquals(schema, StructType([StructField("_c0", IntegerType(), True)])) + def _get_content(self, content): + """ + Strips leading spaces from content up to the first '|' in each line. + """ + import re + pattern = re.compile(r'^ *\|', re.MULTILINE) --- End diff -- We don't have to compile the pattern each time here since it's not going to be reused. You could just put this into re.sub I believe.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org