[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/11444#discussion_r55467820 --- Diff: python/pyspark/sql/types.py --- @@ -681,6 +681,129 @@ def __eq__(self, other): for v in [ArrayType, MapType, StructType]) +_FIXED_DECIMAL = re.compile("decimal\\(\\s*(\\d+)\\s*,\\s*(\\d+)\\s*\\)") + + +_BRACKETS = {'(': ')', '[': ']', '{': '}'} + + +def _parse_basic_datatype_string(s): +if s in _all_atomic_types.keys(): +return _all_atomic_types[s]() +elif s == "int": +return IntegerType() +elif _FIXED_DECIMAL.match(s): +m = _FIXED_DECIMAL.match(s) --- End diff -- I think so, the code is copied from https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L860-L864, we can improve them later --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11444#issuecomment-194096618 ah i see, thanks! JIRA created: https://issues.apache.org/jira/browse/SPARK-13762 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11444#issuecomment-194074866 @cloud-fan In Python, the list of names could be a single string, for example: ``` >>> from collections import namedtuple >>> row = namedtuple("row", "a b c") ``` "a b c" is better than ["a", "b", "c"] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11444#issuecomment-194044560 follow up JIRA created: https://issues.apache.org/jira/browse/SPARK-13757 @davies I'm not quite understand "We still does not support a schema string with only column names", `createDataFrame` can accept schema of string list, and regard it as column names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/11444#discussion_r55449167 --- Diff: python/pyspark/sql/types.py --- @@ -681,6 +681,129 @@ def __eq__(self, other): for v in [ArrayType, MapType, StructType]) +_FIXED_DECIMAL = re.compile("decimal\\(\\s*(\\d+)\\s*,\\s*(\\d+)\\s*\\)") + + +_BRACKETS = {'(': ')', '[': ']', '{': '}'} + + +def _parse_basic_datatype_string(s): +if s in _all_atomic_types.keys(): +return _all_atomic_types[s]() +elif s == "int": +return IntegerType() +elif _FIXED_DECIMAL.match(s): +m = _FIXED_DECIMAL.match(s) --- End diff -- Is it possible to call _FIXED_DECIMAL.match(s) once ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11444 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11444#issuecomment-193985934 LGTM, merging this into master. We still does not support a schema string with only column names, and quoted name in schema string, could you create follow up JIRA for them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11444#discussion_r55322763 --- Diff: python/pyspark/sql/types.py --- @@ -681,6 +681,139 @@ def __eq__(self, other): for v in [ArrayType, MapType, StructType]) +_FIXED_DECIMAL = re.compile("decimal\\((\\d+),(\\d+)\\)") + + +def _parse_basic_datatype_string(s): +if s == "null": +return NullType() +elif s == "boolean": +return BooleanType() +elif s == "byte": +return ByteType() +elif s == "short": +return ShortType() +elif s == "int": +return IntegerType() +elif s == "long": +return LongType() +elif s == "float": +return FloatType() +elif s == "double": +return DoubleType() +elif s == "decimal": +return DecimalType() +elif _FIXED_DECIMAL.match(s): +m = _FIXED_DECIMAL.match(json_value) +return DecimalType(int(m.group(1)), int(m.group(2))) +elif s == "string": +return StringType() +elif s == "date": +return DateType() +elif s == "timestamp": +return TimestampType() +elif s == "binary": +return BinaryType() +else: +raise ValueError("Cannot parse datatype string: %s" % s) + + +def _ignore_brackets_split(s, separator): +parts = [] +buf = "" +level = 0 +for c in s: +if c == "<": +level += 1 +buf += c +elif c == ">": +if level == 0: +raise ValueError("Cannot parse datatype string: %s" % s) +level -= 1 +buf += c +elif c == separator and level > 0: +buf += c +elif c == separator: +parts.append(buf) +buf = "" +else: +buf += c + +if len(buf) == 0: +raise ValueError("Cannot parse datatype string: %s" % s) +parts.append(buf) +return parts + + +def _parse_struct_type_string(s): +parts = _ignore_brackets_split(s, ",") +fields = [] +for part in parts: +name_and_type = _ignore_brackets_split(part, ":") +if len(name_and_type) != 2: +raise ValueError("Cannot parse datatype string: %s" % s) +field_name = name_and_type[0].strip() +field_type = _parse_datatype_string(name_and_type[1]) +fields.append(StructField(field_name, field_type)) +return StructType(fields) + + +def _parse_datatype_string(s): --- End diff -- We could doc it for now, (and create a JIRA for it) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11444#issuecomment-193611600 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11444#issuecomment-193611603 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52629/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11444#issuecomment-193611469 **[Test build #52629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52629/consoleFull)** for PR 11444 at commit [`77ff36b`](https://github.com/apache/spark/commit/77ff36baa992f4350d7c2650bdda1d267cdc0e77). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/11444#discussion_r55316023 --- Diff: python/pyspark/sql/types.py --- @@ -681,6 +681,139 @@ def __eq__(self, other): for v in [ArrayType, MapType, StructType]) +_FIXED_DECIMAL = re.compile("decimal\\((\\d+),(\\d+)\\)") + + +def _parse_basic_datatype_string(s): +if s == "null": +return NullType() +elif s == "boolean": +return BooleanType() +elif s == "byte": +return ByteType() +elif s == "short": +return ShortType() +elif s == "int": +return IntegerType() +elif s == "long": +return LongType() +elif s == "float": +return FloatType() +elif s == "double": +return DoubleType() +elif s == "decimal": +return DecimalType() +elif _FIXED_DECIMAL.match(s): +m = _FIXED_DECIMAL.match(json_value) +return DecimalType(int(m.group(1)), int(m.group(2))) +elif s == "string": +return StringType() +elif s == "date": +return DateType() +elif s == "timestamp": +return TimestampType() +elif s == "binary": +return BinaryType() +else: +raise ValueError("Cannot parse datatype string: %s" % s) + + +def _ignore_brackets_split(s, separator): +parts = [] +buf = "" +level = 0 +for c in s: +if c == "<": +level += 1 +buf += c +elif c == ">": +if level == 0: +raise ValueError("Cannot parse datatype string: %s" % s) +level -= 1 +buf += c +elif c == separator and level > 0: +buf += c +elif c == separator: +parts.append(buf) +buf = "" +else: +buf += c + +if len(buf) == 0: +raise ValueError("Cannot parse datatype string: %s" % s) +parts.append(buf) +return parts + + +def _parse_struct_type_string(s): +parts = _ignore_brackets_split(s, ",") +fields = [] +for part in parts: +name_and_type = _ignore_brackets_split(part, ":") +if len(name_and_type) != 2: +raise ValueError("Cannot parse datatype string: %s" % s) +field_name = name_and_type[0].strip() +field_type = _parse_datatype_string(name_and_type[1]) +fields.append(StructField(field_name, field_type)) +return StructType(fields) + + +def _parse_datatype_string(s): --- End diff -- not now, should we support it? It will make the parser more complicated though... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11444#issuecomment-193604622 **[Test build #52629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52629/consoleFull)** for PR 11444 at commit [`77ff36b`](https://github.com/apache/spark/commit/77ff36baa992f4350d7c2650bdda1d267cdc0e77). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org