Problem with concatenating two dataframes
In the following code, I am trying to create some key-value pairs in a
dictionary where the first element is a name and the second element is a
dataframe.
# Creating a dictionary
data = {'Value':[0,0,0]}
kernel_df = pd.DataFrame(data, index=['M1','M2','M3'])
dict = {'dummy':kernel_df}
# dummy -> Value
# M1 0
# M2 0
# M3 0
Then I read a file and create some batches and compare the name in the batch
with the stored names in dictionary. If it doesn't exist, a new key-value (name
and dataframe) is created. Otherwise, the Value column is appended to the
existing dataframe.
df = pd.read_csv('test.batch.csv')
print(df)
for i in range(0, len(df), 3):
print("\n--BATCH BEGIN")
batch_df = df.iloc[i:i+3]
name = batch_df.loc[i].at["Name"]
values = batch_df.loc[:,["Value"]]
print(name)
print(values)
print("--BATCH END")
if name in dict:
# Append values to the existing key
dict[name] = pd.concat( dict[name],values ) ERROR
else:
# Create a new pair in dictionary
dict[name] = values;
As you can see in the output, the join statement has error.
ID Name Metric Value
0 0 K1 M1 10
1 0 K1 M2 5
2 0 K1 M3 10
3 1 K2 M1 20
4 1 K2 M2 10
5 1 K2 M3 15
6 2 K1 M1 2
7 2 K1 M2 2
8 2 K1 M3 2
--BATCH BEGIN
K1
Value
0 10
1 5
2 10
--BATCH END
--BATCH BEGIN
K2
Value
3 20
4 10
5 15
--BATCH END
--BATCH BEGIN
K1
Value
6 2
7 2
8 2
--BATCH END
As it reaches the contact() statement, I get this error:
TypeError: first argument must be an iterable of pandas objects, you passed an
object of type "DataFrame"
Based on the definition I wrote in the beginning of the code, "dict[name]"
should be a dataframe. Isn't that?
How can I fix that?
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Re: Problem with concatenating two dataframes
>Try this instead:
>
>
> dict[name] = pd.concat([dict[name], values])
OK. That fixed the problem, however, I see that they are concatenated
vertically. How can I change that to horizontal? The printed dictionary in the
end looks like
{'dummy': Value
M1 0
M2 0
M3 0, 'K1': Value
0 10
1 5
2 10
6 2
7 2
8 2, 'K2': Value
3 20
4 10
5 15}
For K1, there should be three rows and two columns labeled as Value.
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Re: Problem with concatenating two dataframes
>The second argument of pd.concat is 'axis', which defaults to 0. Try
>using 1 instead of 0.
Unfortunately, that doesn't help...
dict[name] = pd.concat( [dict[name],values], axis=1 )
{'dummy': Value
M1 0
M2 0
M3 0, 'K1':Value Value
0 10.0NaN
15.0NaN
2 10.0NaN
6NaN2.0
7NaN2.0
8NaN2.0, 'K2':Value
3 20
4 10
5 15}
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Returning the index of a row in dataframe
Hi In the following dataframe, I want to get the index string by specifying the row number which is the same as value column. Value global loads 0 global stores 1 local loads 2 For example, `df.iloc[1].index.name` should return "global stores" but the output is `None`. Any idea about that? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Returning the index of a row in dataframe
>>> df.iloc[1].name Correct I also see that 'df.index[1]' works fine. Thanks. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Using astype(int) for strings with thousand separator
Hi While reading a csv file, some cells have values like '1,024' which I mean they contains thousand separator ','. Therefore, when I want to process them with row = df.iloc[0].astype(int) I get the following error ValueError: invalid literal for int() with base 10: '1,024' How can I fix that? Any idea? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Using astype(int) for strings with thousand separator
> (see > https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) Got it. Thanks. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
get_axes not present?
Hi I am using the following versions >>> import matplotlib >>> print(matplotlib. __version__) 3.3.4 >>> import pandas as pd >>> print(pd.__version__) 1.2.3 >>> import sys >>> sys.version_info sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0) In my code, I use axes in Pandas plot() like this (note that I omit some variables in this snippet to highlight the problem): def plot_dataframe(df, cnt, axes): plt.subplot(2, 1, 1) ax1 = row.plot( fontsize=font_size, linewidth=line_width, markersize=marker_size, marker='o', title='Raw values', label=cnt, ax=axes[0] ) def plot_kernels(my_dict2): fig,axes = plt.subplots(2,1, figsize=(20, 15)) should_plot = plot_dataframe(df, cnt, axes=axes) for ax in axes: ax.legend() plt.show() However, I get this error: Traceback (most recent call last): File "process_csv.py", line 174, in plot_kernels( my_dict2 ) File "process_csv.py", line 62, in plot_kernels should_plot = plot_dataframe(df, cnt, axes=axes) File "process_csv.py", line 34, in plot_dataframe ax1 = row.plot( fontsize=font_size, linewidth=line_width, markersize=marker_size, marker='o', title='Raw values', label=cnt, ax=axes[0] ) File "/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_core.py", line 955, in __call__ return plot_backend.plot(data, kind=kind, **kwargs) File "/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/__init__.py", line 61, in plot plot_obj.generate() File "/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py", line 283, in generate self._adorn_subplots() File "/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py", line 483, in _adorn_subplots all_axes = self._get_subplots() File "/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py", line 903, in _get_subplots ax for ax in self.axes[0].get_figure().get_axes() if isinstance(ax, Subplot) AttributeError: 'NoneType' object has no attribute 'get_axes' I guess there is a mismatch between versions. Is there any workaround for that? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: get_axes not present?
>It's not saying get_axes doesn't exist because of version skew, it's >saying that the object returned by the call to the left of it >(get_figure()) returned None, and None doesn't have methods > >Something isn't set up right, but you'll have to trace that through. Do you think the following statement is correct? ax1 = row.plot( fontsize=font_size, linewidth=line_width, markersize=marker_size, marker='o', title='Raw values', label=cnt, ax=axes[0] ) ax1.set_ylabel( yax_label, fontsize=font_size ) As you can see I put the result of plot() to ax1 and then use some functions, e.g. set_ylabel(). On the other hand, I have specified `label` and `ax` in plot(), too. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: get_axes not present?
>And what is the result of plot()? Is it a valid object, or is it None?
Well the error happens on the plot() line. I tried to print some information
like this:
print("axes=", axes)
print("axes[0]=", axes[0])
print("cnt=", cnt)
print("row=", row)
ax1 = row.plot( fontsize=font_size, linewidth=line_width,
markersize=marker_size, marker='o', title='Raw values', label=cnt, ax=axes[0] )
The output looks like
axes= [ ]
axes[0]= AxesSubplot(0.125,0.53;0.775x0.35)
cnt= 1
row= 1 278528
2 278528
3 278528
4 278528
5 278528
...
5604 278528
5605 278528
5606 278528
5607 278528
5608 278528
Name: 4, Length: 5608, dtype: int64
Traceback (most recent call last):
File "process_csv.py", line 178, in
plot_kernels( my_dict2 )
File "process_csv.py", line 66, in plot_kernels
should_plot = plot_dataframe(df, cnt, axes)
File "process_csv.py", line 38, in plot_dataframe
ax1 = row.plot( fontsize=font_size, linewidth=line_width,
markersize=marker_size, marker='o', title='Raw values', label=cnt, ax=axes[0] )
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_core.py",
line 955, in __call__
return plot_backend.plot(data, kind=kind, **kwargs)
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/__init__.py",
line 61, in plot
plot_obj.generate()
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 283, in generate
self._adorn_subplots()
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 483, in _adorn_subplots
all_axes = self._get_subplots()
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 903, in _get_subplots
ax for ax in self.axes[0].get_figure().get_axes() if isinstance(ax, Subplot)
AttributeError: 'NoneType' object has no attribute 'get_axes'
The error is weird. I have stick at this error...
Any thoughts on that?
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Re: get_axes not present?
>The best way to get
>assistance here on the list is to create a minimal, self-contained,
>run-able, example program that you can post in its entirety here that
>demonstrates the issue.
I created a sample code with input. Since the code processes a csv file to
group input rows, I also included those in this minimal code but those
preprocesses are not buggy. In this sample code, I used print() to print
necessary information. The error exists in the plot function. I tested the
dictionary build before that and it is fine.
Code is available at https://pastebin.com/giAnjJDV and the input file
(test.batch.csv) is available https://pastebin.com/Hdp4Wt9B
The run command is "python3 test.py". With the versions specified in my system,
here is the full output:
$ python3 test.py
Reading file...
matplotlib version = 3.3.4
pandas version = 1.2.3
sys version sys.version_info(major=3, minor=8, micro=10, releaselevel='final',
serial=0)
Original dictionary = {'dummy': Value
M1 0
M2 0
M3 0, 'K1::foo(bar::z(x,u))': Value Value
0 10 2
1 5 2
2 10 2, 'K2::foo()': Value
0 20
1 10
2 15, 'K3::foo(baar::y(z,u))': Value
0 12
1 13
2 14, 'K3::foo(bar::y(z,u))': Value
0 6
1 7
2 8}
New dictionary for plot = {'dummy': Value
M1 0
M2 0
M3 0, 'K1::foo(bar::z(x,u))': Value Value
0 10 2
1 5 2
2 10 2, 'K3::foo(bar::y(z,u))': Value
0 6
1 7
2 8}
Key is K1::foo(bar::z(x,u)) -> df is Value Value
0 10 2
1 5 2
2 10 2
axes= [ ]
axes[0]= AxesSubplot(0.125,0.53;0.775x0.35)
cnt= 1
row= 1 10
2 2
Name: 0, dtype: int64
Traceback (most recent call last):
File "test.py", line 74, in
plot_kernels(my_dict2)
File "test.py", line 52, in plot_kernels
plot_dataframe(df, cnt, axes)
File "test.py", line 36, in plot_dataframe
ax1 = row.plot(label=cnt, ax=axes[0], marker='o') # Line chart
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_core.py",
line 955, in __call__
return plot_backend.plot(data, kind=kind, **kwargs)
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/__init__.py",
line 61, in plot
plot_obj.generate()
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 283, in generate
self._adorn_subplots()
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 483, in _adorn_subplots
all_axes = self._get_subplots()
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 903, in _get_subplots
ax for ax in self.axes[0].get_figure().get_axes() if isinstance(ax, Subplot)
AttributeError: 'NoneType' object has no attribute 'get_axes'
I am pretty sure that there is a version mismatch because on a system with
Pandas 1.3.3 the output should be like https://imgur.com/a/LZ9eAzl
Any feedback is appreciated.
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Re: get_axes not present?
>I installed the latest pandas, although on Python 3.10, and the script >worked without a problem. Yes as I wrote it works with 1.3.3 but mine is 1.2.3. I am trying to keep the current version because of the possible future consequences. In the end maybe I have to upgrade the pandas. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: get_axes not present?
>Your example isn't minimal enough for me to be able to pin it down any
>better than that, though.
Chris,
I was able to simply it even further. Please look at this:
$ cat test.batch.csv
Value,Value
10,2
5,2
10,2
$ cat test.py
import pandas as pd
import csv,sys
import matplotlib
import matplotlib.pyplot as plt
df = pd.read_csv('test.batch.csv')
print(df)
def plot_dataframe(df, cnt, axes):
df.columns = range(1, len(df.columns)+1) # Ignore the column header
row = df.iloc[0].astype(int) # First row in the dataframe
plt.subplot(2, 1, 1)
print("axes=", axes)
print("axes[0]=", axes[0])
print("cnt=", cnt)
print("row=", row)
ax1 = row.plot(label=cnt, ax=axes[0], marker='o') # Line chart
ax1.set_ylabel( 'test', fontsize=15 )
plt.subplot(2, 1, 2)
df2 = row.value_counts()
df2.reindex().plot(kind='bar', label=cnt, ax=axes[1]) # Histogram
def plot_kernels(df):
fig,axes = plt.subplots(2,1, figsize=(20, 15))
cnt=1
plot_dataframe(df, cnt, axes)
cnt = cnt + 1
for ax in axes:
ax.legend()
plt.show()
print("matplotlib version = ", matplotlib.__version__)
print("pandas version = ", pd.__version__)
print("sys version", sys.version_info)
plot_kernels(df)
And the output is
$ python3 test.py
Value Value.1
0 10 2
1 5 2
2 10 2
matplotlib version = 3.3.4
pandas version = 1.2.3
sys version sys.version_info(major=3, minor=8, micro=10, releaselevel='final',
serial=0)
axes= [ ]
axes[0]= AxesSubplot(0.125,0.53;0.775x0.35)
cnt= 1
row= 1 10
2 2
Name: 0, dtype: int64
Traceback (most recent call last):
File "test.py", line 41, in
plot_kernels(df)
File "test.py", line 29, in plot_kernels
plot_dataframe(df, cnt, axes)
File "test.py", line 19, in plot_dataframe
ax1 = row.plot(label=cnt, ax=axes[0], marker='o') # Line chart
File
"/home/mnaderan/.local/lib/python3.8/site-packages/pandas/plotting/_core.py",
line 955, in __call__
return plot_backend.plot(data, kind=kind, **kwargs)
File
"/home/mnaderan/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/__init__.py",
line 61, in plot
plot_obj.generate()
File
"/home/mnaderan/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 283, in generate
self._adorn_subplots()
File
"/home/mnaderan/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 483, in _adorn_subplots
all_axes = self._get_subplots()
File
"/home/mnaderan/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 903, in _get_subplots
ax for ax in self.axes[0].get_figure().get_axes() if isinstance(ax, Subplot)
AttributeError: 'NoneType' object has no attribute 'get_axes'
Any idea about that?
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
About get_axes() in Pandas 1.2.3
Hi
I asked a question some days ago, but due to the lack of minimal producing
code, the topic got a bit messy. So, I have decided to ask it in a new topic
with a clear minimum code.
With Pandas 1.2.3 and Matplotlib 3.3.4, the following plot() functions returns
error and I don't know what is wrong with that.
import pandas as pd
import csv,sys
import matplotlib
import matplotlib.pyplot as plt
df = pd.read_csv('test.batch.csv')
print(df)
print("matplotlib version = ", matplotlib.__version__)
print("pandas version = ", pd.__version__)
print("sys version", sys.version_info)
fig,axes = plt.subplots(2,1, figsize=(20, 15))
df.columns = range(1, len(df.columns)+1) # Ignore the column header
row = df.iloc[0].astype(int) # First row in the dataframe
plt.subplot(2, 1, 1)
print("axes=", axes)
print("axes[0]=", axes[0])
print("row=", row)
ax1 = row.plot(ax=axes[0]) # Line chart <-- ERROR
ax1.set_ylabel( 'test' )
plt.subplot(2, 1, 2)
df2 = row.value_counts()
df2.reindex().plot(kind='bar', ax=axes[1]) # Histogram
plt.show()
The output is
$ cat test.batch.csv
Value,Value
10,2
5,2
10,2
$ python3 test.py
Value Value.1
0 102
1 52
2 102
matplotlib version = 3.3.4
pandas version = 1.2.3
sys version sys.version_info(major=3, minor=8, micro=10, releaselevel='final',
serial=0)
axes= [ ]
axes[0]= AxesSubplot(0.125,0.53;0.775x0.35)
row= 110
2 2
Name: 0, dtype: int64
Traceback (most recent call last):
File "test.py", line 20, in
ax1 = row.plot(ax=axes[0]) # Line chart
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_core.py",
line 955, in __call__
return plot_backend.plot(data, kind=kind, **kwargs)
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/__init__.py",
line 61, in plot
plot_obj.generate()
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 283, in generate
self._adorn_subplots()
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 483, in _adorn_subplots
all_axes = self._get_subplots()
File
"/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py",
line 903, in _get_subplots
ax for ax in self.axes[0].get_figure().get_axes() if isinstance(ax, Subplot)
AttributeError: 'NoneType' object has no attribute 'get_axes'
Although the plot() crashes, I see that row and axes variables are valid. So, I
wonder what is the workaround for this code without upgrading Pandas or
Matplotlib. Any idea?
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Re: About get_axes() in Pandas 1.2.3
>I can help you narrow it down a bit. The problem actually occurs inside
>this function call somehow. You can verify this by doing this:
>
>
>fig,axes = plt.subplots(2,1, figsize=(20, 15))
>
>print ("axes[0].get_figure()=",axes[0].get_figure())
>
>You'll find that get_figure() is returning None, when it should be
>returning Figure(2000x1500). So plt.subplots is not doing something
>properly which was corrected at some point. Oddly enough, with pandas
>1.1.4 and matplotlib 3.2.2 (which is what my system has by default),
>there is no error, although the graph is blank.
>
>In my venv, when I upgrade matplotlib from 3.3.4 to 3.5, the problem
>also goes away. 3.4.0 also works.
>
>Honestly your solution is going to be to provide a virtual environment
>with your script. That way you can bundle the appropriate dependencies
>without modifying anything on the host system.
Thanks for the feedback. You are right.
I agree that virtualenv is the most safest method at this time.
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Extracting dataframe column with multiple conditions on row values
Hi I have a csv file like this V0,V1,V2,V3 4,1,1,1 6,4,5,2 2,3,6,7 And I want to search two rows for a match and find the column. For example, I want to search row[0] for 1 and row[1] for 5. The corresponding column is V2 (which is the third column). Then I want to return the value at row[2] and the found column. The result should be 6 then. I can manually extract the specified rows (with index 0 and 1 which are fixed) and manually iterate over them like arrays to find a match. Then I key1 = 1 key2 = 5 row1 = df.iloc[0] # row=[4,1,1,1] row2 = df.iloc[1] # row=[6,4,5,2] for i in range(len(row1)): if row1[i] == key1: for j in range(len(row2)): if row2[j] == key2: res = df.iloc[:,j] print(res)# 6 Is there any way to use built-in function for a more efficient code? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Writing a string with comma in one column of CSV file
Hi, I use the following line to write some information to a CSV file which is comma delimited. f = open(output_file, 'w', newline='') wr = csv.writer(f) ... f.write(str(n) + "," + str(key) + "\n" ) Problem is that key is a string which may contain ',' and this causes the final CSV file to have more than 2 columns, while I want to write the whole key as a single column. I know that wr.writerow([key]) writes the entire key in one column, but I would like to do the same with write(). Any idea to fix that? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Writing a string with comma in one column of CSV file
Right. I was also able to put all columns in a string and then use writerow(). Thanks. Regards, Mahmood On Saturday, January 15, 2022, 10:33:08 PM GMT+1, alister via Python-list wrote: On Sat, 15 Jan 2022 20:56:22 + (UTC), Mahmood Naderan wrote: > Hi, > I use the following line to write some information to a CSV file which > is comma delimited. > > f = open(output_file, 'w', newline='') > wr = csv.writer(f) > ... > f.write(str(n) + "," + str(key) + "\n" ) > > > Problem is that key is a string which may contain ',' and this causes > the final CSV file to have more than 2 columns, while I want to write > the whole key as a single column. > > I know that wr.writerow([key]) writes the entire key in one column, but > I would like to do the same with write(). Any idea to fix that? > > > Regards, > Mahmood you need to quote the data the easies way to ensure this is to inculde to QUOTE_ALL option when opening the file wr = csv.writer(output, quoting=csv.QUOTE_ALL) -- Chocolate chip. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Unable to install "collect" via pip3
Hi I can install collect with pip for python2.7 $ pip install --user collect Collecting collect Using cached https://files.pythonhosted.org/packages/cf/5e/c0f0f51d081665374a2c219ea4ba23fb1e179b70dded96dc16606786d828/collect-0.1.1.tar.gz Collecting couchdbkit>=0.5.7 (from collect) Using cached https://files.pythonhosted.org/packages/a1/13/9e9ff695a385c44f62b4766341b97f2bd8b596962df2a0beabf358468b70/couchdbkit-0.6.5.tar.gz Collecting restkit>=4.2.2 (from couchdbkit>=0.5.7->collect) Downloading https://files.pythonhosted.org/packages/76/b9/d90120add1be718f853c53008cf5b62d74abad1d32bd1e7097dd913ae053/restkit-4.2.2.tar.gz (1.3MB) 100% || 1.3MB 633kB/s Collecting http-parser>=0.8.3 (from restkit>=4.2.2->couchdbkit>=0.5.7->collect) Downloading https://files.pythonhosted.org/packages/07/c4/22e3c76c2313c26dd5f84f1205b916ff38ea951aab0c4544b6e2f5920d64/http-parser-0.8.3.tar.gz (83kB) 100% || 92kB 2.4MB/s Collecting socketpool>=0.5.3 (from restkit>=4.2.2->couchdbkit>=0.5.7->collect) Downloading https://files.pythonhosted.org/packages/d1/39/fae99a735227234ffec389b252c6de2bc7816bf627f56b4c558dc46c85aa/socketpool-0.5.3.tar.gz Building wheels for collected packages: collect, couchdbkit, restkit, http-parser, socketpool Running setup.py bdist_wheel for collect ... done Stored in directory: /home/mnaderan/.cache/pip/wheels/b9/7c/7c/b09b334cc0e27b4f63ee9f6f19ca1f3db8672666a7e0f3d9cd Running setup.py bdist_wheel for couchdbkit ... done Stored in directory: /home/mnaderan/.cache/pip/wheels/f6/05/1b/f8f576ef18564bc68ab6e64f405e1263448036208cafb221e0 Running setup.py bdist_wheel for restkit ... done Stored in directory: /home/mnaderan/.cache/pip/wheels/48/c5/32/d0d25fb272791a68c49c26150f332d9b9492d0bc9ea0cdd2c7 Running setup.py bdist_wheel for http-parser ... done Stored in directory: /home/mnaderan/.cache/pip/wheels/22/db/06/cb609a3345e7aa87206de160f00cc6af364650d1139d904a25 Running setup.py bdist_wheel for socketpool ... done Stored in directory: /home/mnaderan/.cache/pip/wheels/93/f6/8c/65924848766618647078cb66b1d964e8b80876536e84517469 Successfully built collect couchdbkit restkit http-parser socketpool Installing collected packages: http-parser, socketpool, restkit, couchdbkit, collect Successfully installed collect-0.1.1 couchdbkit-0.6.5 http-parser-0.8.3 restkit-4.2.2 socketpool-0.5.3 However, pip3 fails with this error $ pip3 install --user collect Collecting collect Using cached https://files.pythonhosted.org/packages/cf/5e/c0f0f51d081665374a2c219ea4ba23fb1e179b70dded96dc16606786d828/collect-0.1.1.tar.gz Collecting couchdbkit>=0.5.7 (from collect) Using cached https://files.pythonhosted.org/packages/a1/13/9e9ff695a385c44f62b4766341b97f2bd8b596962df2a0beabf358468b70/couchdbkit-0.6.5.tar.gz Complete output from command python setup.py egg_info: Traceback (most recent call last): File "", line 1, in File "/tmp/pip-build-qf95n0tt/couchdbkit/setup.py", line 25, in long_description = file( NameError: name 'file' is not defined Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-qf95n0tt/couchdbkit/ I can not figure out what is the problem. Any way to fix that? More info: $ which python /usr/bin/python $ ls -l /usr/bin/python lrwxrwxrwx 1 root root 9 Apr 16 2018 /usr/bin/python -> python2.7 $ which python3 /usr/bin/python3 $ ls -l /usr/bin/python3 lrwxrwxrwx 1 root root 9 Jun 21 2018 /usr/bin/python3 -> python3.6 Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Unable to install "collect" via pip3
Yes thank you. The package is not compatible with 3.x. Regards, Mahmood On Saturday, December 21, 2019, 1:40:29 AM GMT+3:30, Barry wrote: > On 20 Dec 2019, at 15:27, Mahmood Naderan via Python-list > wrote: > > Hi > > I can install collect with pip for python2.7 > $ pip install --user collect > Collecting collect > Using cached >https://files.pythonhosted.org/packages/cf/5e/c0f0f51d081665374a2c219ea4ba23fb1e179b70dded96dc16606786d828/collect-0.1.1.tar.gz > Collecting couchdbkit>=0.5.7 (from collect) > Using cached >https://files.pythonhosted.org/packages/a1/13/9e9ff695a385c44f62b4766341b97f2bd8b596962df2a0beabf358468b70/couchdbkit-0.6.5.tar.gz > Collecting restkit>=4.2.2 (from couchdbkit>=0.5.7->collect) > Downloading >https://files.pythonhosted.org/packages/76/b9/d90120add1be718f853c53008cf5b62d74abad1d32bd1e7097dd913ae053/restkit-4.2.2.tar.gz > (1.3MB) > 100% || 1.3MB 633kB/s > Collecting http-parser>=0.8.3 (from > restkit>=4.2.2->couchdbkit>=0.5.7->collect) > Downloading >https://files.pythonhosted.org/packages/07/c4/22e3c76c2313c26dd5f84f1205b916ff38ea951aab0c4544b6e2f5920d64/http-parser-0.8.3.tar.gz > (83kB) > 100% || 92kB 2.4MB/s > Collecting socketpool>=0.5.3 (from restkit>=4.2.2->couchdbkit>=0.5.7->collect) > Downloading >https://files.pythonhosted.org/packages/d1/39/fae99a735227234ffec389b252c6de2bc7816bf627f56b4c558dc46c85aa/socketpool-0.5.3.tar.gz > Building wheels for collected packages: collect, couchdbkit, restkit, > http-parser, socketpool > Running setup.py bdist_wheel for collect ... done > Stored in directory: >/home/mnaderan/.cache/pip/wheels/b9/7c/7c/b09b334cc0e27b4f63ee9f6f19ca1f3db8672666a7e0f3d9cd > Running setup.py bdist_wheel for couchdbkit ... done > Stored in directory: >/home/mnaderan/.cache/pip/wheels/f6/05/1b/f8f576ef18564bc68ab6e64f405e1263448036208cafb221e0 > Running setup.py bdist_wheel for restkit ... done > Stored in directory: >/home/mnaderan/.cache/pip/wheels/48/c5/32/d0d25fb272791a68c49c26150f332d9b9492d0bc9ea0cdd2c7 > Running setup.py bdist_wheel for http-parser ... done > Stored in directory: >/home/mnaderan/.cache/pip/wheels/22/db/06/cb609a3345e7aa87206de160f00cc6af364650d1139d904a25 > Running setup.py bdist_wheel for socketpool ... done > Stored in directory: >/home/mnaderan/.cache/pip/wheels/93/f6/8c/65924848766618647078cb66b1d964e8b80876536e84517469 > Successfully built collect couchdbkit restkit http-parser socketpool > Installing collected packages: http-parser, socketpool, restkit, couchdbkit, > collect > Successfully installed collect-0.1.1 couchdbkit-0.6.5 http-parser-0.8.3 > restkit-4.2.2 socketpool-0.5.3 > However, pip3 fails with this error > $ pip3 install --user collect > Collecting collect > Using cached >https://files.pythonhosted.org/packages/cf/5e/c0f0f51d081665374a2c219ea4ba23fb1e179b70dded96dc16606786d828/collect-0.1.1.tar.gz > Collecting couchdbkit>=0.5.7 (from collect) > Using cached >https://files.pythonhosted.org/packages/a1/13/9e9ff695a385c44f62b4766341b97f2bd8b596962df2a0beabf358468b70/couchdbkit-0.6.5.tar.gz > Complete output from command python setup.py egg_info: > Traceback (most recent call last): > File "", line 1, in > File "/tmp/pip-build-qf95n0tt/couchdbkit/setup.py", line 25, in > long_description = file( > NameError: name 'file' is not defined My guess is that file is python 2 only. Couchdbkit needs porting to python 3. Barry > > > Command "python setup.py egg_info" failed with error code 1 in > /tmp/pip-build-qf95n0tt/couchdbkit/ > I can not figure out what is the problem. Any way to fix that? > > More info: > $ which python > /usr/bin/python > $ ls -l /usr/bin/python > lrwxrwxrwx 1 root root 9 Apr 16 2018 /usr/bin/python -> python2.7 > $ which python3 > /usr/bin/python3 > $ ls -l /usr/bin/python3 > lrwxrwxrwx 1 root root 9 Jun 21 2018 /usr/bin/python3 -> python3.6 > > > > Regards, > Mahmood > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Grepping words for match in a file
Hi I have some lines in a text file like ADD R1, R2 ADD3 R4, R5, R6 ADD.MOV R1, R2, [0x10] If I grep words with this code for line in fp: if my_word in line: Then if my_word is "ADD", I get 3 matches. However, if I grep word with this code for line in fp: for word in line.split(): if my_word == word: Then I get only one match which is ADD R1. R2. Actually I want to get 2 matches. ADD R1, R2 and ADD.MOV R1, R2, [0x10] because these two lines are actually "ADD" instructions. However, "ADD3" is something else. How can I fix the code for that purpose? -- Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Install python via MS batch file
Hello, I have downloaded python-3.6.1-amd64.exe and it is fine to install it through GUI. However, I want to write a batch file to install it via command line. Since the installation process is interactive, it seems that the auto-install batch file is difficult. What I want to do is: set python path install in the default location. Any idea about that? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
packaging python code
Hi, I have simple piece of code which uses two libraries (numpy and openpyxl). The script is called from another application. Currently, if someone wants to run my program, he has to first install the python completely via its installer. Is there any way to pack my .py with all required libraries and create a self running package? Something like building exe file with static libraries. Therefore, the user won't install any thing manually. Please let me know if there is such procedure. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: packaging python code
OK. I did that but it fails! Please see the stack
D:\ThinkPad\Documents\NetBeansProjects\ExcelTest>pyinstaller exread.py
96 INFO: PyInstaller: 3.2.1
96 INFO: Python: 3.6.1
98 INFO: Platform: Windows-10-10.0.14393-SP0
103 INFO: wrote D:\ThinkPad\Documents\NetBeansProjects\ExcelTest\exread.spec
109 INFO: UPX is not available.
111 INFO: Extending PYTHONPATH with paths
['D:\\ThinkPad\\Documents\\NetBeansProjects\\ExcelTest',
'D:\\ThinkPad\\Documents\\NetBeansProjects\\ExcelTest']
112 INFO: checking Analysis
113 INFO: Building Analysis because out00-Analysis.toc is non existent
113 INFO: Initializing module dependency graph...
117 INFO: Initializing module graph hooks...
119 INFO: Analyzing base_library.zip ...
Traceback (most recent call last):
File
"C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\Scripts\pyinstaller-script.py",
line 11, in
load_entry_point('PyInstaller==3.2.1', 'console_scripts', 'pyinstaller')()
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\__main__.py",
line 90, in run
run_build(pyi_config, spec_file, **vars(args))
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\__main__.py",
line 46, in run_build
PyInstaller.building.build_main.main(pyi_config, spec_file, **kwargs)
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\building\build_main.py",
line 788, in main
build(specfile, kw.get('distpath'), kw.get('workpath'), kw.get('clean_build'))
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\building\build_main.py",
line 734, in build
exec(text, spec_namespace)
File "", line 16, in
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\building\build_main.py",
line 212, in __init__
self.__postinit__()
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\building\datastruct.py",
line 161, in __postinit__
self.assemble()
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\building\build_main.py",
line 317, in assemble
excludes=self.excludes, user_hook_dirs=self.hookspath)
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\depend\analysis.py",
line 560, in initialize_modgraph
graph.import_hook(m)
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\lib\modulegraph\modulegraph.py",
line 1509, in import_hook
source_package, target_module_partname, level)
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\lib\modulegraph\modulegraph.py",
line 1661, in _find_head_package
target_module_headname, target_package_name, source_package)
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\depend\analysis.py",
line 209, in _safe_import_module
module_basename, module_name, parent_package)
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\lib\modulegraph\modulegraph.py",
line 2077, in _safe_import_module
module_name, file_handle, pathname, metadata)
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\lib\modulegraph\modulegraph.py",
line 2167, in _load_module
self._scan_code(m, co, co_ast)
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\lib\modulegraph\modulegraph.py",
line 2585, in _scan_code
module, module_code_object, is_scanning_imports=False)
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\lib\modulegraph\modulegraph.py",
line 2831, in _scan_bytecode
global_attr_name = get_operation_arg_name()
File
"c:\users\thinkpad\appdata\local\programs\python\python36\lib\site-packages\PyInstaller\lib\modulegraph\modulegraph.py",
line 2731, in get_operation_arg_name
return module_code_object.co_names[co_names_index]
IndexError: tuple index out of range
D:\ThinkPad\Documents\NetBeansProjects\ExcelTest>
Regards,
Mahmood
On Monday, May 8, 2017 5:07 PM, Lutz Horn wrote:
> Is there any way to pack my .py with all required libraries and create a self
> running package?
Take a look at PyInstaller:
* http://www.pyinstaller.org/
* https://pyinstaller.readthedocs.io/en/stable/
Lutz
--
https://mail.python.org/mailman/listinfo/python-list
Out of memory while reading excel file
Hello, The following code which uses openpyxl and numpy, fails to read large Excel (xlsx) files. The file si 20Mb which contains 100K rows and 50 columns. W = load_workbook(fname, read_only = True) p = W.worksheets[0] a=[] m = p.max_row n = p.max_column np.array([[i.value for i in j] for j in p.rows]) How can I fix that? I have stuck at this problem. For medium sized files (16K rows and 50 columns) it is fine. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Thanks for your reply. The openpyxl part (reading the workbook) works fine. I printed some debug information and found that when it reaches the np.array, after some 10 seconds, the memory usage goes high. So, I think numpy is unable to manage the memory. Regards, Mahmood On Wednesday, May 10, 2017 7:25 PM, Peter Otten <[email protected]> wrote: Mahmood Naderan via Python-list wrote: > Hello, > > The following code which uses openpyxl and numpy, fails to read large > Excel (xlsx) files. The file si 20Mb which contains 100K rows and 50 > columns. > > > > W = load_workbook(fname, read_only = True) > > p = W.worksheets[0] > > a=[] > > m = p.max_row > > n = p.max_column > > > np.array([[i.value for i in j] for j in p.rows]) > > > > How can I fix that? I have stuck at this problem. For medium sized files > (16K rows and 50 columns) it is fine. The docs at https://openpyxl.readthedocs.io/en/default/optimized.html#read-only-mode promise "(near) constant memory consumption" for the sample script below: from openpyxl import load_workbook wb = load_workbook(filename='large_file.xlsx', read_only=True) ws = wb['big_data'] for row in ws.rows: for cell in row: print(cell.value) If you change only the file and worksheet name to your needs -- does the script run to completion in reasonable time (redirect stdout to /dev/null) and with reasonable memory usage? If it does you may be wasting memory elsewhere; otherwise you might need to convert the xlsx file to csv using your spreadsheet application before processing the data in Python. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Well actually cells are treated as strings and not integer or float numbers. One way to overcome is to get the number of rows and then split it to 4 or 5 arrays and then process them. However, i was looking for a better solution. I read in pages that large excels are in the order of milion rows. Mine is about 100k. Currently, the task manager shows about 4GB of ram usage while working with numpy. Regards, Mahmood On Wed, 5/10/17, Peter Otten <[email protected]> wrote: Subject: Re: Out of memory while reading excel file To: [email protected] Date: Wednesday, May 10, 2017, 3:48 PM Mahmood Naderan via Python-list wrote: > Thanks for your reply. The openpyxl part (reading the workbook) works > fine. I printed some debug information and found that when it reaches the > np.array, after some 10 seconds, the memory usage goes high. > > > So, I think numpy is unable to manage the memory. Hm, I think numpy is designed to manage huge arrays if you have enough RAM. Anyway: are all values of the same type? Then the numpy array may be kept much smaller than in the general case (I think). You can also avoid the intermediate list of lists: wb = load_workbook(filename='beta.xlsx', read_only=True) ws = wb['alpha'] a = numpy.zeros((ws.max_row, ws.max_column), dtype=float) for y, row in enumerate(ws.rows): a[y] = [cell.value for cell in row] -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Hi I will try your code... meanwhile I have to say, as you pointed earlier and as stated in the documents, numpy is designed to handle large arrays and that is the reason I chose that. If there is a better option, please let me know. Regards, Mahmood On Wed, 5/10/17, Peter Otten <[email protected]> wrote: Subject: Re: Out of memory while reading excel file To: [email protected] Date: Wednesday, May 10, 2017, 6:30 PM Mahmood Naderan via Python-list wrote: > Well actually cells are treated as strings and not integer or float > numbers. May I ask why you are using numpy when you are dealing with strings? If you provide a few details about what you are trying to achieve someone may be able to suggest a workable approach. Back-of-the-envelope considerations: 4GB / 5E6 cells amounts to >>> 2**32 / (10 * 50) 858.9934592 about 850 bytes per cell, with an overhead of >>> sys.getsizeof("") 49 that would be 800 ascii chars, down to 200 chars in the worst case. If your strings are much smaller the problem lies elsewhere. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Hi, I am confused with that. If you say that numpy is not suitable for my case and may have large overhead, what is the alternative then? Do you mean that numpy is a good choice here while we can reduce its overhead? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
>a = numpy.zeros((ws.max_row, ws.max_column), dtype=float) >for y, row in enumerate(ws.rows): > a[y] = [cell.value for cell in row] Peter, As I used this code, it gave me an error that cannot convert string to float for the first cell. All cells are strings. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Hi, I used the old fashion coding style to create a matrix and read/add the cells. W = load_workbook(fname, read_only = True) p = W.worksheets[0] m = p.max_row n = p.max_column arr = np.empty((m, n), dtype=object) for r in range(1, m): for c in range(1, n): d = p.cell(row=r, column=c) arr[r, c] = d.value However, the operation is very slow. I printed row number to see how things are going. It took 2 minutes to add 200 rows and about 10 minutes to add the next 200 rows. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
I wrote this: a = np.zeros((p.max_row, p.max_column), dtype=object) for y, row in enumerate(p.rows): for cell in row: print (cell.value) a[y] = cell.value print (a[y]) For one of the cells, I see NM_198576.3 ['NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'] These are 50 NM_198576.3 in a[y] and 50 is the number of columns in my excel file (p.max_column) The excel file looks like CHR1 11,202,100 NM_198576.3 PASS 3.08932G|B|C -. . . Note that in each row, some cells are '-' or '.' only. I want to read all cells as string. Then I will write the matrix in a file and my main code (java) will process that. I chose openpyxl for reading excel files, because Apache POI (a java package for manipulating excel files) consumes huge memory even for medium files. So my python script only transforms an xlsx file to a txt file keeping the cell positions and formats. Any suggestion? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Thanks. That code is so simple and works. However, there are things to be considered. With the CSV format, cells in a row are separated by ',' and for some cells it writes "" around the cell content. So, if the excel looks like CHR1 11,232,445 The output file looks like CHR1,"11,232,445" Is it possible to use as the delimiting character and omit ""? I say that because, my java code which has to read the output file has to do some extra works (using space as delimiter is the default and much easier to work). I want a[0][0] = CHR a[0][1] = 11,232,445 And both are strings. Is that possible? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Excuse me, I changed
csv.writer(outstream)
to
csv.writer(outstream, delimiter =' ')
It puts space between cells and omits "" around some content. However, between
two lines there is a new empty line. In other word, the first line is the first
row of excel file. The second line is empty ("\n") and the third line is the
second row of the excel file.
Any thought?
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Thanks a lot for suggestions. It is now solved. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Concatenating files in order
Hi,
There are some text files ending with _chunk_i where 'i' is an integer. For
example,
XXX_chunk_0
XXX_chunk_1
...
I want to concatenate them in order. Thing is that the total number of files
may be variable. Therefore, I can not specify the number in my python script.
It has to be "for all files ending with _chunk_i".
Next, I can write
with open('final.txt', 'w') as outf:
for fname in filenames:
with open(fname) as inf:
for line in inf:
outf.write(line)
How can I specify the "filenames"?
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Re: Concatenating files in order
>Yup. Make a list of all the file names, write a key function that
>extracts the numbery bits, sort the list based on that key function, and
>go to town.
>
>Alternatively, when you create the files in the first place, make sure
>to use more leading zeros than you could possibly need.
>xxx_chunk_01 sorts less than xxx_chunk_10.
So, if I write
import glob;
for f in glob.glob('*chunk*'):
print(f)
it will print in order. Is that really sorted or it is not guaranteed?
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Re: Concatenating files in order
OK guys thank you very much. It is better to sort them first.
Here is what I wrote
files = glob.glob('*chunk*')
sorted=[[int(name.split("_")[-1]), name] for name in files]
with open('final.txt', 'w') as outf:
for fname in sorted:
with open(fname[1]) as inf:
for line in inf:
outf.write(line)
and it works
Regards,
Mahmood
On Wednesday, May 24, 2017 1:20 AM, bartc wrote:
On 23/05/2017 20:55, Rob Gaddi wrote:
> Yup. Make a list of all the file names, write a key function that
> extracts the numbery bits, sort the list based on that key function, and
> go to town.
Is it necessary to sort them? If XXX is known, then presumably the first
file will be called XXX_chunk_0, the next XXX_chunk_1 and so on.
It would be possible to iterate over such a sequence of filenames, and
keep opening them then writing them to the output until there are no
more files. Or, if a list of matching files is obtained, the length of
the list will also give you the last filename.
(But this won't work if there are gaps in the sequence or the numeric
format is variable.)
--
bartc
--
https://mail.python.org/mailman/listinfo/python-list
--
https://mail.python.org/mailman/listinfo/python-list
Re: Concatenating files in order
Hi guys,
Cameron, thanks for the points. In fact the file name contains multiple '_'
characters. So, I appreciate what you recommended.
filenames = {}
for name in glob.glob('*chunk_*'):
left, right = name.rsplit('_', 1)
if left.endswith('chunk') and right.isdigit():
filenames[int(right)] = filename
sorted_filenames = [ filenames[k] for k in sorted(filenames.keys()) ]
It seems that 'filename' should be 'right'.
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Re: Concatenating files in order
Thank you very much. I understand that
Regards,
Mahmood
On Friday, May 26, 2017 5:01 AM, Cameron Simpson wrote:
On 25May2017 20:37, Mahmood Naderan wrote:
>Cameron, thanks for the points. In fact the file name contains multiple '_'
>characters. So, I appreciate what you recommended.
>
> filenames = {}
> for name in glob.glob('*chunk_*'):
>left, right = name.rsplit('_', 1)
>if left.endswith('chunk') and right.isdigit():
> filenames[int(right)] = filename
> sorted_filenames = [ filenames[k] for k in sorted(filenames.keys()) ]
>
>It seems that 'filename' should be 'right'.
No, 'filename' should be 'name': the original filename. Thanks for the catch.
The idea is to have a map of int->filename so that you can open the files in
numeric order. So 'right' is just the numeric suffix - you need 'name' for the
open() call.
Cheers,
Cameron Simpson
--
https://mail.python.org/mailman/listinfo/python-list
embed a package for proper fun script
Hello, How it is possible to embed a package in my project? I mean, in my python script I have written import openpyxl So, the user may not have installed that package and doesn't understand what is pip! Please let me know the instructions or any document regarding that. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: embed a package for proper fun script
No idea?... Regards, Mahmood On Tuesday, May 30, 2017 1:06 AM, Mahmood Naderan via Python-list wrote: Hello, How it is possible to embed a package in my project? I mean, in my python script I have written import openpyxl So, the user may not have installed that package and doesn't understand what is pip! Please let me know the instructions or any document regarding that. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Python not able to find package but it is installed
Hello, Although I have installed a package via pip on a centos-6.6, python interpreter still says there is no such package! Please see the output below $ python exread2.py input.xlsx tmp/output Traceback (most recent call last): File "/home/mahmood/excetest/exread2.py", line 1, in from openpyxl import load_workbook ImportError: No module named openpyxl $ pip install openpyxl DEPRECATION: Python 2.6 is no longer supported by the Python core team, please upgrade your Python. A future version of pip will drop support for Python 2.6 Requirement already satisfied: openpyxl in /opt/rocks/lib/python2.6/site-packages ... $ ls -l /opt/rocks/lib/python2.6/site-packages/openpyxl* /opt/rocks/lib/python2.6/site-packages/openpyxl: total 84 drwxr-xr-x 2 root root 4096 May 30 12:26 cell drwxr-xr-x 2 root root 4096 May 30 12:26 chart drwxr-xr-x 2 root root 4096 May 30 12:26 chartsheet drwxr-xr-x 2 root root 4096 May 30 12:26 comments drwxr-xr-x 2 root root 4096 May 30 12:26 compat -rw-r--r-- 1 root root 1720 Mar 16 21:35 conftest.py -rw-r--r-- 1 root root 2111 May 30 12:26 conftest.pyc drwxr-xr-x 2 root root 4096 May 30 12:26 descriptors drwxr-xr-x 2 root root 4096 May 30 12:26 drawing drwxr-xr-x 2 root root 4096 May 30 12:26 formatting drwxr-xr-x 2 root root 4096 May 30 12:26 formula -rw-r--r-- 1 root root 880 Mar 16 21:35 __init__.py -rw-r--r-- 1 root root 1009 May 30 12:26 __init__.pyc drwxr-xr-x 2 root root 4096 May 30 12:26 packaging drwxr-xr-x 2 root root 4096 May 30 12:26 reader drwxr-xr-x 2 root root 4096 May 30 12:26 styles drwxr-xr-x 2 root root 4096 May 30 12:26 utils drwxr-xr-x 3 root root 4096 May 30 12:26 workbook drwxr-xr-x 2 root root 4096 May 30 12:26 worksheet drwxr-xr-x 2 root root 4096 May 30 12:26 writer drwxr-xr-x 2 root root 4096 May 30 12:26 xml /opt/rocks/lib/python2.6/site-packages/openpyxl-2.4.7-py2.6.egg-info: total 36 -rw-r--r-- 1 root root 1 May 30 12:26 dependency_links.txt -rw-r--r-- 1 root root 11193 May 30 12:26 installed-files.txt -rw-r--r-- 1 root root 2381 May 30 12:26 PKG-INFO -rw-r--r-- 1 root root16 May 30 12:26 requires.txt -rw-r--r-- 1 root root 5224 May 30 12:26 SOURCES.txt -rw-r--r-- 1 root root 9 May 30 12:26 top_level.txt Any idea to fix that? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: embed a package for proper fun script
Thanks. I will try and come back later. Regards, Mahmood On Tuesday, May 30, 2017 2:03 PM, Paul Moore wrote: On Tuesday, 30 May 2017 08:48:34 UTC+1, Mahmood Naderan wrote: > No idea?... > > > Regards, > Mahmood > > > On Tuesday, May 30, 2017 1:06 AM, Mahmood Naderan via Python-list > wrote: > > > > Hello, > > How it is possible to embed a package in my project? I mean, in my python > script I have written > > > import openpyxl > > > So, the user may not have installed that package and doesn't understand what > is pip! > > Please let me know the instructions or any document regarding that. You might want to look at the zipapp module in the stdlib. Paul -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Python not able to find package but it is installed
Well, on rocks there exist multiple pythons. But by default the active is 2.6.6 $ python -V Python 2.6.6 I have to say that the script doesn't modify sys.path. I only use sys.argv[] there I can put all dependent modules in my project folder but that will be dirty. Regards, Mahmood On Tuesday, May 30, 2017 2:09 PM, Wolfgang Maier wrote: -- https://mail.python.org/mailman/listinfo/python-list
Re: Python not able to find package but it is installed
Well yes. It looks in other folders >>> import openpyxl # trying openpyxl.so # trying openpyxlmodule.so # trying openpyxl.py # trying openpyxl.pyc # trying /usr/lib64/python2.6/openpyxl.so # trying /usr/lib64/python2.6/openpyxlmodule.so # trying /usr/lib64/python2.6/openpyxl.py # trying /usr/lib64/python2.6/openpyxl.pyc # trying /usr/lib64/python2.6/plat-linux2/openpyxl.so # trying /usr/lib64/python2.6/plat-linux2/openpyxlmodule.so # trying /usr/lib64/python2.6/plat-linux2/openpyxl.py # trying /usr/lib64/python2.6/plat-linux2/openpyxl.pyc # trying /usr/lib64/python2.6/lib-dynload/openpyxl.so # trying /usr/lib64/python2.6/lib-dynload/openpyxlmodule.so # trying /usr/lib64/python2.6/lib-dynload/openpyxl.py # trying /usr/lib64/python2.6/lib-dynload/openpyxl.pyc # trying /usr/lib64/python2.6/site-packages/openpyxl.so # trying /usr/lib64/python2.6/site-packages/openpyxlmodule.so # trying /usr/lib64/python2.6/site-packages/openpyxl.py # trying /usr/lib64/python2.6/site-packages/openpyxl.pyc # trying /usr/lib64/python2.6/site-packages/gtk-2.0/openpyxl.so # trying /usr/lib64/python2.6/site-packages/gtk-2.0/openpyxlmodule.so # trying /usr/lib64/python2.6/site-packages/gtk-2.0/openpyxl.py # trying /usr/lib64/python2.6/site-packages/gtk-2.0/openpyxl.pyc # trying /usr/lib/python2.6/site-packages/openpyxl.so # trying /usr/lib/python2.6/site-packages/openpyxlmodule.so # trying /usr/lib/python2.6/site-packages/openpyxl.py # trying /usr/lib/python2.6/site-packages/openpyxl.pyc Traceback (most recent call last): File "", line 1, in ImportError: No module named openpyxl But $ find /opt -name openpyxl /opt/rocks/lib/python2.6/site-packages/openpyxl Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Python not able to find package but it is installed
Consider this output [root@cluster ~]# pip --version pip 9.0.1 from /opt/rocks/lib/python2.6/site-packages/pip-9.0.1-py2.6.egg (python 2.6) [root@cluster ~]# easy_install --version distribute 0.6.10 [root@cluster ~]# find /opt -name python /opt/rocks/lib/graphviz/python /opt/rocks/bin/python /opt/rocks/usr/bin/python /opt/python /opt/python/bin/python [root@cluster ~]# find /usr -name python /usr/include/google/protobuf/compiler/python /usr/bin/python /usr/share/doc/m2crypto-0.20.2/demo/Zope/lib/python /usr/share/doc/m2crypto-0.20.2/demo/ZopeX3/install_dir/lib/python /usr/share/doc/m2crypto-0.20.2/demo/Zope27/install_dir/lib/python /usr/share/gdb/python /usr/share/swig/1.3.40/python [root@cluster ~]# find /opt -name pip /opt/rocks/lib/python2.6/site-packages/pip-9.0.1-py2.6.egg/pip /opt/rocks/bin/pip [root@cluster ~]# find /usr -name pip [root@cluster ~]# So, yes there are multiple versions of python and it seems that the search location of pip and python are different. I will try to modify the path to see what is what. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
openpyxl reads cell with format
Hello guys... With openpyxl, it seems that when the content of a cell is something like "4-Feb", then it is read as "2016-02-04 00:00:00" that looks like a calendar conversion. How can I read the cell as text instead of such an automatic conversion? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Openpyxl cell format
Hello guys, With openpyxl, it seems that when the content of a cell is something like "4-Feb", then it is read as "2016-02-04 00:00:00" that looks like a calendar conversion. How can I read the cell as text instead of such an automatic conversion? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Openpyxl cell format
Maybe... But specifically in my case, the excel file is exported from a web page. I think there should be a way to read the content as a pure text. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: openpyxl reads cell with format
>if the cell is an Excel date, it IS stored as a numeric As I said, the "shape" of the cell is similar to date. The content which is "4-Feb" is not a date. It is a string which I expect from cell.value to read it as "4-Feb" and nothing else. Also, I said before that the file is downloaded from a website. That means, from a button on a web page, I chose "export as excel" to download the data. I am pretty sure that auto format feature of the excel is trying to convert it as a date. So, I am looking for a way to ignore such auto formatting. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: openpyxl reads cell with format
OK thank you very much. As you said, it seems that it is too late for my python script. Regards, Mahmood On Monday, June 5, 2017 10:41 PM, Dennis Lee Bieber wrote: On Mon, 5 Jun 2017 14:46:18 + (UTC), Mahmood Naderan via Python-list declaimed the following: >>if the cell is an Excel date, it IS stored as a numeric > >As I said, the "shape" of the cell is similar to date. The content which is >"4-Feb" is not a date. It is a string which I expect from cell.value to read >it as "4-Feb" and nothing else. > >Also, I said before that the file is downloaded from a website. That means, >from a button on a web page, I chose "export as excel" to download the data. I >am pretty sure that auto format feature of the excel is trying to convert it >as a date. > Then you'll have to modify the Excel file before the "export" to tell IT that the column is plain text BEFORE the data goes into the column. The normal behavior for Excel is: if something looks like a date (partial or full) when entered, Excel will store it as a numeric "days from epoch" and flag the cell as a "date" field. The visual representation is up to the client -- as my sample table shows, the very same input value looks different based upon how the column is defined. > >So, I am looking for a way to ignore such auto formatting. > By the time Python sees it, it is too late -- all you have is an integer number tagged as a "date", and an import process that renders that number as a Python datetime object (which you can then render however you want https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior ) -- Wulfraed Dennis Lee Bieber AF6VN [email protected]://wlfraed.home.netcom.com/ -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
