[jira] [Updated] (SPARK-37325) Result vector from pandas_udf was not the required length

2021-11-16 Thread liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated SPARK-37325:

Description: 
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
  return pd.Series(num)

df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{code:java}
RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1{code}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks what it cause?

  was:
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
  return pd.Series(num)

df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{code:java}
RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1{code}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?


> Result vector from pandas_udf was not the required length
> -
>
> Key: SPARK-37325
> URL: https://issues.apache.org/jira/browse/SPARK-37325
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
> Environment: 1
>Reporter: liu
>Priority: Major
>
>  
> {code:java}
> schema = StructType([
> StructField("node", StringType())
> ])
> rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
> rdd = rdd.map(lambda obj: {'node': obj})
> df_node = spark.createDataFrame(rdd, schema=schema)
> df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
> pd_fname = df_fname.select('fname').toPandas()
> @pandas_udf(IntegerType(), PandasUDFType.SCALAR)
> def udf_match(word: pd.Series) -> pd.Series:
>   my_Series = pd_fname.squeeze() # dataframe to Series
>   num = int(my_Series.str.contains(word.array[0]).sum())
>   return pd.Series(num)
> df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
> {code}
>  
> Hi, I have two dataframe, and I try above method, however, I get this
> {code:java}
> RuntimeError: Result vector from pandas_udf was not the required length: 
> expected 100, got 1{code}
> it will be really thankful, if there is any helps
>  
> PS: for the method itself, I think there is no problem, I create same sample 
> data to verify it successfully, however, when I use the real data error came. 
> I checked the data, can't figure out,
> does anyone thinks what it cause?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37325) Result vector from pandas_udf was not the required length

2021-11-16 Thread liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated SPARK-37325:

Description: 
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
  return pd.Series(num)

df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{code:java}
RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1{code}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone know what it cause?

  was:
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
  return pd.Series(num)

df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{code:java}
RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1{code}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks what it cause?


> Result vector from pandas_udf was not the required length
> -
>
> Key: SPARK-37325
> URL: https://issues.apache.org/jira/browse/SPARK-37325
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
> Environment: 1
>Reporter: liu
>Priority: Major
>
>  
> {code:java}
> schema = StructType([
> StructField("node", StringType())
> ])
> rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
> rdd = rdd.map(lambda obj: {'node': obj})
> df_node = spark.createDataFrame(rdd, schema=schema)
> df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
> pd_fname = df_fname.select('fname').toPandas()
> @pandas_udf(IntegerType(), PandasUDFType.SCALAR)
> def udf_match(word: pd.Series) -> pd.Series:
>   my_Series = pd_fname.squeeze() # dataframe to Series
>   num = int(my_Series.str.contains(word.array[0]).sum())
>   return pd.Series(num)
> df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
> {code}
>  
> Hi, I have two dataframe, and I try above method, however, I get this
> {code:java}
> RuntimeError: Result vector from pandas_udf was not the required length: 
> expected 100, got 1{code}
> it will be really thankful, if there is any helps
>  
> PS: for the method itself, I think there is no problem, I create same sample 
> data to verify it successfully, however, when I use the real data error came. 
> I checked the data, can't figure out,
> does anyone know what it cause?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37325) Result vector from pandas_udf was not the required length

2021-11-16 Thread liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated SPARK-37325:

Description: 
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
  return pd.Series(num)

df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{code:java}
RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1{code}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?

  was:
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
     return pd.Series(num)

df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{code:java}
RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1{code}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?


> Result vector from pandas_udf was not the required length
> -
>
> Key: SPARK-37325
> URL: https://issues.apache.org/jira/browse/SPARK-37325
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
> Environment: 1
>Reporter: liu
>Priority: Major
>
>  
> {code:java}
> schema = StructType([
> StructField("node", StringType())
> ])
> rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
> rdd = rdd.map(lambda obj: {'node': obj})
> df_node = spark.createDataFrame(rdd, schema=schema)
> df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
> pd_fname = df_fname.select('fname').toPandas()
> @pandas_udf(IntegerType(), PandasUDFType.SCALAR)
> def udf_match(word: pd.Series) -> pd.Series:
>   my_Series = pd_fname.squeeze() # dataframe to Series
>   num = int(my_Series.str.contains(word.array[0]).sum())
>   return pd.Series(num)
> df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
> {code}
>  
> Hi, I have two dataframe, and I try above method, however, I get this
> {code:java}
> RuntimeError: Result vector from pandas_udf was not the required length: 
> expected 100, got 1{code}
> it will be really thankful, if there is any helps
>  
> PS: for the method itself, I think there is no problem, I create same sample 
> data to verify it successfully, however, when I use the real data error came. 
> I checked the data, can't figure out,
> does anyone thinks where it cause?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37325) Result vector from pandas_udf was not the required length

2021-11-16 Thread liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated SPARK-37325:

Description: 
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
     return pd.Series(num)

df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{code:java}
RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1{code}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?

  was:
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)
df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()
@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
     return pd.Series(num)
df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{code:java}
RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1{code}

it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?


> Result vector from pandas_udf was not the required length
> -
>
> Key: SPARK-37325
> URL: https://issues.apache.org/jira/browse/SPARK-37325
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
> Environment: 1
>Reporter: liu
>Priority: Major
>
>  
> {code:java}
> schema = StructType([
> StructField("node", StringType())
> ])
> rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
> rdd = rdd.map(lambda obj: {'node': obj})
> df_node = spark.createDataFrame(rdd, schema=schema)
> df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
> pd_fname = df_fname.select('fname').toPandas()
> @pandas_udf(IntegerType(), PandasUDFType.SCALAR)
> def udf_match(word: pd.Series) -> pd.Series:
>   my_Series = pd_fname.squeeze() # dataframe to Series
>   num = int(my_Series.str.contains(word.array[0]).sum())
>      return pd.Series(num)
> df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
> {code}
>  
> Hi, I have two dataframe, and I try above method, however, I get this
> {code:java}
> RuntimeError: Result vector from pandas_udf was not the required length: 
> expected 100, got 1{code}
> it will be really thankful, if there is any helps
>  
> PS: for the method itself, I think there is no problem, I create same sample 
> data to verify it successfully, however, when I use the real data error came. 
> I checked the data, can't figure out,
> does anyone thinks where it cause?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37325) Result vector from pandas_udf was not the required length

2021-11-16 Thread liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated SPARK-37325:

Description: 
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)
df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()
@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
     return pd.Series(num)
df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{code:java}
RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1{code}

it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?

  was:
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)
df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()
@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
     return pd.Series(num)
df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{{}}
{code:java}
RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1{code}
{{}}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?


> Result vector from pandas_udf was not the required length
> -
>
> Key: SPARK-37325
> URL: https://issues.apache.org/jira/browse/SPARK-37325
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
> Environment: 1
>Reporter: liu
>Priority: Major
>
>  
> {code:java}
> schema = StructType([
> StructField("node", StringType())
> ])
> rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
> rdd = rdd.map(lambda obj: {'node': obj})
> df_node = spark.createDataFrame(rdd, schema=schema)
> df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
> pd_fname = df_fname.select('fname').toPandas()
> @pandas_udf(IntegerType(), PandasUDFType.SCALAR)
> def udf_match(word: pd.Series) -> pd.Series:
>   my_Series = pd_fname.squeeze() # dataframe to Series
>   num = int(my_Series.str.contains(word.array[0]).sum())
>      return pd.Series(num)
> df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
> {code}
>  
> Hi, I have two dataframe, and I try above method, however, I get this
> {code:java}
> RuntimeError: Result vector from pandas_udf was not the required length: 
> expected 100, got 1{code}
> it will be really thankful, if there is any helps
>  
> PS: for the method itself, I think there is no problem, I create same sample 
> data to verify it successfully, however, when I use the real data error came. 
> I checked the data, can't figure out,
> does anyone thinks where it cause?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37325) Result vector from pandas_udf was not the required length

2021-11-16 Thread liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated SPARK-37325:

Description: 
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)
df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()
@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
     return pd.Series(num)
df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{{}}
{code:java}
RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1{code}
{{}}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?

  was:
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)
df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()
@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
     return pd.Series(num)
df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{{RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1}}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?


> Result vector from pandas_udf was not the required length
> -
>
> Key: SPARK-37325
> URL: https://issues.apache.org/jira/browse/SPARK-37325
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
> Environment: 1
>Reporter: liu
>Priority: Major
>
>  
> {code:java}
> schema = StructType([
> StructField("node", StringType())
> ])
> rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
> rdd = rdd.map(lambda obj: {'node': obj})
> df_node = spark.createDataFrame(rdd, schema=schema)
> df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
> pd_fname = df_fname.select('fname').toPandas()
> @pandas_udf(IntegerType(), PandasUDFType.SCALAR)
> def udf_match(word: pd.Series) -> pd.Series:
>   my_Series = pd_fname.squeeze() # dataframe to Series
>   num = int(my_Series.str.contains(word.array[0]).sum())
>      return pd.Series(num)
> df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
> {code}
>  
> Hi, I have two dataframe, and I try above method, however, I get this
> {{}}
> {code:java}
> RuntimeError: Result vector from pandas_udf was not the required length: 
> expected 100, got 1{code}
> {{}}
> it will be really thankful, if there is any helps
>  
> PS: for the method itself, I think there is no problem, I create same sample 
> data to verify it successfully, however, when I use the real data error came. 
> I checked the data, can't figure out,
> does anyone thinks where it cause?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37325) Result vector from pandas_udf was not the required length

2021-11-16 Thread liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated SPARK-37325:

Description: 
 
{code:java}
schema = StructType([
StructField("node", StringType())
])
rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
rdd = rdd.map(lambda obj: {'node': obj})
df_node = spark.createDataFrame(rdd, schema=schema)
df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()
@pandas_udf(IntegerType(), PandasUDFType.SCALAR)
def udf_match(word: pd.Series) -> pd.Series:
  my_Series = pd_fname.squeeze() # dataframe to Series
  num = int(my_Series.str.contains(word.array[0]).sum())
     return pd.Series(num)
df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
{code}
 

Hi, I have two dataframe, and I try above method, however, I get this
{{RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1}}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?

  was:
schema = StructType([
StructField("node", StringType())
])
{{rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")}}
{{rdd = rdd.map(lambda obj: \{'node': obj})}}
{{df_node = spark.createDataFrame(rdd, schema=schema)}}

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

{{@pandas_udf(IntegerType(), PandasUDFType.SCALAR)}}
{{def udf_match(word: pd.Series) -> pd.Series:}}
{{  my_Series = pd_fname.squeeze() # dataframe to Series}}
{{  num = int(my_Series.str.contains(word.array[0]).sum())}}
     return pd.Series(num)

{{df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))}}

Hi, I have two dataframe, and I try above method, however, I get this
{{RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1}}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?


> Result vector from pandas_udf was not the required length
> -
>
> Key: SPARK-37325
> URL: https://issues.apache.org/jira/browse/SPARK-37325
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
> Environment: 1
>Reporter: liu
>Priority: Major
>
>  
> {code:java}
> schema = StructType([
> StructField("node", StringType())
> ])
> rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")
> rdd = rdd.map(lambda obj: {'node': obj})
> df_node = spark.createDataFrame(rdd, schema=schema)
> df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
> pd_fname = df_fname.select('fname').toPandas()
> @pandas_udf(IntegerType(), PandasUDFType.SCALAR)
> def udf_match(word: pd.Series) -> pd.Series:
>   my_Series = pd_fname.squeeze() # dataframe to Series
>   num = int(my_Series.str.contains(word.array[0]).sum())
>      return pd.Series(num)
> df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))
> {code}
>  
> Hi, I have two dataframe, and I try above method, however, I get this
> {{RuntimeError: Result vector from pandas_udf was not the required length: 
> expected 100, got 1}}
> it will be really thankful, if there is any helps
>  
> PS: for the method itself, I think there is no problem, I create same sample 
> data to verify it successfully, however, when I use the real data error came. 
> I checked the data, can't figure out,
> does anyone thinks where it cause?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37325) Result vector from pandas_udf was not the required length

2021-11-16 Thread liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated SPARK-37325:

Description: 
schema = StructType([
StructField("node", StringType())
])
{{rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")}}
{{rdd = rdd.map(lambda obj: \{'node': obj})}}
{{df_node = spark.createDataFrame(rdd, schema=schema)}}

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

{{@pandas_udf(IntegerType(), PandasUDFType.SCALAR)}}
{{def udf_match(word: pd.Series) -> pd.Series:}}
{{  my_Series = pd_fname.squeeze() # dataframe to Series}}
{{  num = int(my_Series.str.contains(word.array[0]).sum())}}
     return pd.Series(num)

{{df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))}}

Hi, I have two dataframe, and I try above method, however, I get this
{{RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1}}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the real data error came. I 
checked the data, can't figure out,

does anyone thinks where it cause?

  was:
schema = StructType([
StructField("node", StringType())
])
{{rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")}}
{{rdd = rdd.map(lambda obj: \{'node': obj})}}
{{df_node = spark.createDataFrame(rdd, schema=schema)}}

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

{{@pandas_udf(IntegerType(), PandasUDFType.SCALAR)}}
{{def udf_match(word: pd.Series) -> pd.Series:}}
{{  my_Series = pd_fname.squeeze() # dataframe to Series}}
{{  num = int(my_Series.str.contains(word.array[0]).sum())}}
     return pd.Series(num)

{{df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))}}


Hi, I have two dataframe, and I try above method, however, I get this
{{RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1}}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the really data it came. I 
checked the data, can't figure out,

does anyone thinks where it cause?


> Result vector from pandas_udf was not the required length
> -
>
> Key: SPARK-37325
> URL: https://issues.apache.org/jira/browse/SPARK-37325
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
> Environment: 1
>Reporter: liu
>Priority: Major
>
> schema = StructType([
> StructField("node", StringType())
> ])
> {{rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")}}
> {{rdd = rdd.map(lambda obj: \{'node': obj})}}
> {{df_node = spark.createDataFrame(rdd, schema=schema)}}
> df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
> pd_fname = df_fname.select('fname').toPandas()
> {{@pandas_udf(IntegerType(), PandasUDFType.SCALAR)}}
> {{def udf_match(word: pd.Series) -> pd.Series:}}
> {{  my_Series = pd_fname.squeeze() # dataframe to Series}}
> {{  num = int(my_Series.str.contains(word.array[0]).sum())}}
>      return pd.Series(num)
> {{df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))}}
> Hi, I have two dataframe, and I try above method, however, I get this
> {{RuntimeError: Result vector from pandas_udf was not the required length: 
> expected 100, got 1}}
> it will be really thankful, if there is any helps
>  
> PS: for the method itself, I think there is no problem, I create same sample 
> data to verify it successfully, however, when I use the real data error came. 
> I checked the data, can't figure out,
> does anyone thinks where it cause?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37325) Result vector from pandas_udf was not the required length

2021-11-14 Thread liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated SPARK-37325:

Description: 
schema = StructType([
StructField("node", StringType())
])
{{rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")}}
{{rdd = rdd.map(lambda obj: \{'node': obj})}}
{{df_node = spark.createDataFrame(rdd, schema=schema)}}

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

{{@pandas_udf(IntegerType(), PandasUDFType.SCALAR)}}
{{def udf_match(word: pd.Series) -> pd.Series:}}
{{  my_Series = pd_fname.squeeze() # dataframe to Series}}
{{  num = int(my_Series.str.contains(word.array[0]).sum())}}
     return pd.Series(num)

{{df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))}}


Hi, I have two dataframe, and I try above method, however, I get this
{{RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1}}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the really data it came. I 
checked the data, can't figure out,

does anyone thinks where it cause?

  was:
schema = StructType([
StructField("node", StringType())
])
{{rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")}}
{{rdd = rdd.map(lambda obj: \{'node': obj})}}
{{df_node = spark.createDataFrame(rdd, schema=schema)}}

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

{{@pandas_udf(IntegerType(), PandasUDFType.SCALAR)}}
{{def udf_match(word: pd.Series) -> pd.Series:}}
{{  my_Series = pd_fname.squeeze() # dataframe to Series}}
{{  num = int(my_Series.str.contains(word.array[0]).sum())}}
     return pd.Series(num)

{{df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))}}
Hi, I have two dataframe, and I try above method, however, I get this
{{RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1}}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the really data it came. I 
checked the data, can't figure out,

does anyone thinks where it cause?


> Result vector from pandas_udf was not the required length
> -
>
> Key: SPARK-37325
> URL: https://issues.apache.org/jira/browse/SPARK-37325
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
> Environment: 1
>Reporter: liu
>Priority: Major
>
> schema = StructType([
> StructField("node", StringType())
> ])
> {{rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")}}
> {{rdd = rdd.map(lambda obj: \{'node': obj})}}
> {{df_node = spark.createDataFrame(rdd, schema=schema)}}
> df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
> pd_fname = df_fname.select('fname').toPandas()
> {{@pandas_udf(IntegerType(), PandasUDFType.SCALAR)}}
> {{def udf_match(word: pd.Series) -> pd.Series:}}
> {{  my_Series = pd_fname.squeeze() # dataframe to Series}}
> {{  num = int(my_Series.str.contains(word.array[0]).sum())}}
>      return pd.Series(num)
> {{df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))}}
> Hi, I have two dataframe, and I try above method, however, I get this
> {{RuntimeError: Result vector from pandas_udf was not the required length: 
> expected 100, got 1}}
> it will be really thankful, if there is any helps
>  
> PS: for the method itself, I think there is no problem, I create same sample 
> data to verify it successfully, however, when I use the really data it came. 
> I checked the data, can't figure out,
> does anyone thinks where it cause?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37325) Result vector from pandas_udf was not the required length

2021-11-14 Thread liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated SPARK-37325:

Description: 
schema = StructType([
StructField("node", StringType())
])
{{rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")}}
{{rdd = rdd.map(lambda obj: \{'node': obj})}}
{{df_node = spark.createDataFrame(rdd, schema=schema)}}

df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()

{{@pandas_udf(IntegerType(), PandasUDFType.SCALAR)}}
{{def udf_match(word: pd.Series) -> pd.Series:}}
{{  my_Series = pd_fname.squeeze() # dataframe to Series}}
{{  num = int(my_Series.str.contains(word.array[0]).sum())}}
     return pd.Series(num)

{{df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))}}
Hi, I have two dataframe, and I try above method, however, I get this
{{RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1}}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the really data it came. I 
checked the data, can't figure out,

does anyone thinks where it cause?

  was:
{{schema = StructType([
StructField("node", StringType())
])}}
{{rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")}}
{{rdd = rdd.map(lambda obj: \{'node': obj})}}
{{df_node = spark.createDataFrame(rdd, schema=schema)}}
{{}}
{{}}
{{df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
pd_fname = df_fname.select('fname').toPandas()}}
{{}}
{{@pandas_udf(IntegerType(), PandasUDFType.SCALAR)}}
{{def udf_match(word: pd.Series) -> pd.Series:}}
{{  my_Series = pd_fname.squeeze()# dataframe to Series}}
{{  num = int(my_Series.str.contains(word.array[0]).sum())}}
  {{return pd.Series(num)}}
{{}}
{{}}
{{df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))}}
Hi, I have two dataframe, and I try above method, however, I get this
{{RuntimeError: Result vector from pandas_udf was not the required length: 
expected 100, got 1}}
it will be really thankful, if there is any helps

 

PS: for the method itself, I think there is no problem, I create same sample 
data to verify it successfully, however, when I use the really data it came. I 
checked the data, can't figure out,

does anyone thinks where it cause?


> Result vector from pandas_udf was not the required length
> -
>
> Key: SPARK-37325
> URL: https://issues.apache.org/jira/browse/SPARK-37325
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
> Environment: 1
>Reporter: liu
>Priority: Major
>
> schema = StructType([
> StructField("node", StringType())
> ])
> {{rdd = sc.textFile("hdfs:///user/liubiao/KG/graph_dict.txt")}}
> {{rdd = rdd.map(lambda obj: \{'node': obj})}}
> {{df_node = spark.createDataFrame(rdd, schema=schema)}}
> df_fname =spark.read.parquet("hdfs:///user/liubiao/KG/fnames.parquet")
> pd_fname = df_fname.select('fname').toPandas()
> {{@pandas_udf(IntegerType(), PandasUDFType.SCALAR)}}
> {{def udf_match(word: pd.Series) -> pd.Series:}}
> {{  my_Series = pd_fname.squeeze() # dataframe to Series}}
> {{  num = int(my_Series.str.contains(word.array[0]).sum())}}
>      return pd.Series(num)
> {{df = df_node.withColumn("match_fname_num", udf_match(df_node["node"]))}}
> Hi, I have two dataframe, and I try above method, however, I get this
> {{RuntimeError: Result vector from pandas_udf was not the required length: 
> expected 100, got 1}}
> it will be really thankful, if there is any helps
>  
> PS: for the method itself, I think there is no problem, I create same sample 
> data to verify it successfully, however, when I use the really data it came. 
> I checked the data, can't figure out,
> does anyone thinks where it cause?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org