[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11444#discussion_r55467820
  
--- Diff: python/pyspark/sql/types.py ---
@@ -681,6 +681,129 @@ def __eq__(self, other):
   for v in [ArrayType, MapType, StructType])
 
 
+_FIXED_DECIMAL = re.compile("decimal\\(\\s*(\\d+)\\s*,\\s*(\\d+)\\s*\\)")
+
+
+_BRACKETS = {'(': ')', '[': ']', '{': '}'}
+
+
+def _parse_basic_datatype_string(s):
+if s in _all_atomic_types.keys():
+return _all_atomic_types[s]()
+elif s == "int":
+return IntegerType()
+elif _FIXED_DECIMAL.match(s):
+m = _FIXED_DECIMAL.match(s)
--- End diff --

I think so, the code is copied from 
https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L860-L864,
 we can improve them later


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-08 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11444#issuecomment-194096618
  
ah i see, thanks! JIRA created: 
https://issues.apache.org/jira/browse/SPARK-13762


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-08 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11444#issuecomment-194074866
  
@cloud-fan In Python, the list of names could be a single string, for 
example:
```
>>> from collections import namedtuple
>>> row = namedtuple("row", "a b c")
```

"a b c" is better than ["a", "b", "c"]



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-08 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11444#issuecomment-194044560
  
follow up JIRA created: https://issues.apache.org/jira/browse/SPARK-13757

@davies I'm not quite understand "We still does not support a schema string 
with only column names", `createDataFrame` can accept schema of string list, 
and regard it as column names.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-08 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/11444#discussion_r55449167
  
--- Diff: python/pyspark/sql/types.py ---
@@ -681,6 +681,129 @@ def __eq__(self, other):
   for v in [ArrayType, MapType, StructType])
 
 
+_FIXED_DECIMAL = re.compile("decimal\\(\\s*(\\d+)\\s*,\\s*(\\d+)\\s*\\)")
+
+
+_BRACKETS = {'(': ')', '[': ']', '{': '}'}
+
+
+def _parse_basic_datatype_string(s):
+if s in _all_atomic_types.keys():
+return _all_atomic_types[s]()
+elif s == "int":
+return IntegerType()
+elif _FIXED_DECIMAL.match(s):
+m = _FIXED_DECIMAL.match(s)
--- End diff --

Is it possible to call _FIXED_DECIMAL.match(s) once ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11444


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-08 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11444#issuecomment-193985934
  
LGTM, merging this into master.

We still does not support  a schema string with only column names, and 
quoted name in schema string, could you create follow up JIRA for them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-07 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11444#discussion_r55322763
  
--- Diff: python/pyspark/sql/types.py ---
@@ -681,6 +681,139 @@ def __eq__(self, other):
   for v in [ArrayType, MapType, StructType])
 
 
+_FIXED_DECIMAL = re.compile("decimal\\((\\d+),(\\d+)\\)")
+
+
+def _parse_basic_datatype_string(s):
+if s == "null":
+return NullType()
+elif s == "boolean":
+return BooleanType()
+elif s == "byte":
+return ByteType()
+elif s == "short":
+return ShortType()
+elif s == "int":
+return IntegerType()
+elif s == "long":
+return LongType()
+elif s == "float":
+return FloatType()
+elif s == "double":
+return DoubleType()
+elif s == "decimal":
+return DecimalType()
+elif _FIXED_DECIMAL.match(s):
+m = _FIXED_DECIMAL.match(json_value)
+return DecimalType(int(m.group(1)), int(m.group(2)))
+elif s == "string":
+return StringType()
+elif s == "date":
+return DateType()
+elif s == "timestamp":
+return TimestampType()
+elif s == "binary":
+return BinaryType()
+else:
+raise ValueError("Cannot parse datatype string: %s" % s)
+
+
+def _ignore_brackets_split(s, separator):
+parts = []
+buf = ""
+level = 0
+for c in s:
+if c == "<":
+level += 1
+buf += c
+elif c == ">":
+if level == 0:
+raise ValueError("Cannot parse datatype string: %s" % s)
+level -= 1
+buf += c
+elif c == separator and level > 0:
+buf += c
+elif c == separator:
+parts.append(buf)
+buf = ""
+else:
+buf += c
+
+if len(buf) == 0:
+raise ValueError("Cannot parse datatype string: %s" % s)
+parts.append(buf)
+return parts
+
+
+def _parse_struct_type_string(s):
+parts = _ignore_brackets_split(s, ",")
+fields = []
+for part in parts:
+name_and_type = _ignore_brackets_split(part, ":")
+if len(name_and_type) != 2:
+raise ValueError("Cannot parse datatype string: %s" % s)
+field_name = name_and_type[0].strip()
+field_type = _parse_datatype_string(name_and_type[1])
+fields.append(StructField(field_name, field_type))
+return StructType(fields)
+
+
+def _parse_datatype_string(s):
--- End diff --

We could doc it for now, (and create a JIRA for it)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11444#issuecomment-193611600
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11444#issuecomment-193611603
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52629/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11444#issuecomment-193611469
  
**[Test build #52629 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52629/consoleFull)**
 for PR 11444 at commit 
[`77ff36b`](https://github.com/apache/spark/commit/77ff36baa992f4350d7c2650bdda1d267cdc0e77).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11444#discussion_r55316023
  
--- Diff: python/pyspark/sql/types.py ---
@@ -681,6 +681,139 @@ def __eq__(self, other):
   for v in [ArrayType, MapType, StructType])
 
 
+_FIXED_DECIMAL = re.compile("decimal\\((\\d+),(\\d+)\\)")
+
+
+def _parse_basic_datatype_string(s):
+if s == "null":
+return NullType()
+elif s == "boolean":
+return BooleanType()
+elif s == "byte":
+return ByteType()
+elif s == "short":
+return ShortType()
+elif s == "int":
+return IntegerType()
+elif s == "long":
+return LongType()
+elif s == "float":
+return FloatType()
+elif s == "double":
+return DoubleType()
+elif s == "decimal":
+return DecimalType()
+elif _FIXED_DECIMAL.match(s):
+m = _FIXED_DECIMAL.match(json_value)
+return DecimalType(int(m.group(1)), int(m.group(2)))
+elif s == "string":
+return StringType()
+elif s == "date":
+return DateType()
+elif s == "timestamp":
+return TimestampType()
+elif s == "binary":
+return BinaryType()
+else:
+raise ValueError("Cannot parse datatype string: %s" % s)
+
+
+def _ignore_brackets_split(s, separator):
+parts = []
+buf = ""
+level = 0
+for c in s:
+if c == "<":
+level += 1
+buf += c
+elif c == ">":
+if level == 0:
+raise ValueError("Cannot parse datatype string: %s" % s)
+level -= 1
+buf += c
+elif c == separator and level > 0:
+buf += c
+elif c == separator:
+parts.append(buf)
+buf = ""
+else:
+buf += c
+
+if len(buf) == 0:
+raise ValueError("Cannot parse datatype string: %s" % s)
+parts.append(buf)
+return parts
+
+
+def _parse_struct_type_string(s):
+parts = _ignore_brackets_split(s, ",")
+fields = []
+for part in parts:
+name_and_type = _ignore_brackets_split(part, ":")
+if len(name_and_type) != 2:
+raise ValueError("Cannot parse datatype string: %s" % s)
+field_name = name_and_type[0].strip()
+field_type = _parse_datatype_string(name_and_type[1])
+fields.append(StructField(field_name, field_type))
+return StructType(fields)
+
+
+def _parse_datatype_string(s):
--- End diff --

not now, should we support it? It will make the parser more complicated 
though...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13593][SQL] improve the `createDataFram...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11444#issuecomment-193604622
  
**[Test build #52629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52629/consoleFull)**
 for PR 11444 at commit 
[`77ff36b`](https://github.com/apache/spark/commit/77ff36baa992f4350d7c2650bdda1d267cdc0e77).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org