Re: How to copy paragraphs (with number formatting) and images from Words (.docx) and paste into Excel (.xlsx) using Python

2020-03-22 Thread A S
On Monday, 23 March 2020 01:58:38 UTC+8, Beverly Pope  wrote:
> > On Mar 22, 2020, at 9:47 AM, A S  wrote:
> > 
> > I can't seem to paste pictures into this discussion so please see both my 
> > current and desired Excel output here:
> > 
> > https://stackoverflow.com/questions/60800494/how-to-copy-paragraphs-with-number-formatting-and-images-from-words-docx-an
> >  
> > <https://stackoverflow.com/questions/60800494/how-to-copy-paragraphs-with-number-formatting-and-images-from-words-docx-an>
> Did you try using the 2 part answer on the stackoverflow webpage?
> 
> Bev in TX

I'm able to get the paragraphs copied correctly now! But i'm trying to figure 
out if there's a way to copy and paste the images into the Excel, along with 
the paragraphs as well. Do you have an idea? :)
-- 
https://mail.python.org/mailman/listinfo/python-list


How to copy paragraphs (with number formatting) and images from Words (.docx) and paste into Excel (.xlsx) using Python

2020-03-22 Thread A S
I have contract clauses in Words (.docx) format that needs to be frequently 
copy and pasted into Excel (.xlsx) to be sent to the third party. The clauses 
are often updated hence there's always a need to copy and paste these clauses 
over. I only need to copy and paste all the paragraphs and images after the 
contents page. Here is a sample of the Clause document 
(https://drive.google.com/open?id=1ZzV29R6y2q0oU3HAVrqsFa158OhvpxEK).

I have tried doing up a code using Python to achieve this outcome. Here is the 
code that I have done so far:

!pip install python-docx
import docx
import xlsxwriter

document = docx.Document("Clauses Sample.docx")
wb = xlsxwriter.Workbook('C://xx//clauses sample.xlsx')

docText = []
index_row = 0
Sheet1 = wb.add_worksheet("Shee")

for paragraph in document.paragraphs:
if paragraph.text:
docText.append(paragraph.text)
xx = '\n'.join(docText)

Sheet1.write(index_row,0, xx)

index_row = index_row+1

wb.close()
#print(xx) 
However, my Excel file output looks like this:

I can't seem to paste pictures into this discussion so please see both my 
current and desired Excel output here:

https://stackoverflow.com/questions/60800494/how-to-copy-paragraphs-with-number-formatting-and-images-from-words-docx-an
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Extract all words between two keywords in .txt file (Python)

2019-12-11 Thread A S
On Thursday, 12 December 2019 04:55:46 UTC+8, Joel Goldstick  wrote:
> On Wed, Dec 11, 2019 at 1:31 PM Ben Bacarisse  wrote:
> >
> > A S  writes:
> >
> > > I would like to extract all words within specific keywords in a .txt
> > > file. For the keywords, there is a starting keyword of "PROC SQL;" (I
> > > need this to be case insensitive) and the ending keyword could be
> > > either "RUN;", "quit;" or "QUIT;". This is my sample .txt file.
> > >
> > > Thus far, this is my code:
> > >
> > > with open('lan sample text file1.txt') as file:
> > > text = file.read()
> > > regex = re.compile(r'(PROC SQL;|proc sql;(.*?)RUN;|quit;|QUIT;)')
> > > k = regex.findall(text)
> > > print(k)
> >
> > Try
> >
> >   re.compile(r'(?si)(PROC SQL;.*(?:QUIT|RUN);)')
> >
> > Read up one what (?si) means and what (?:...) means..  You can do the
> > same by passing flags to the compile method.
> >
> > > Output:
> > >
> > > [('quit;', ''), ('quit;', ''), ('PROC SQL;', '')]
> >
> > Your main issue is that | binds weakly.  Your whole pattern tries to
> > match any one of just four short sub-patterns:
> >
> > PROC SQL;
> > proc sql;(.*?)RUN;
> > quit;
> > QUIT;
> >
> > --
> > Ben.
> > --
> > https://mail.python.org/mailman/listinfo/python-list
> 
> Consider using python string functions.
> 
> 1. read your string, lets call it s.
> 2 . start = s.find("PROC SQL:"
>  This will find the starting index point.  It returns and index
> 3. DO the same for each of the three possible ending  strings.  Use if/else
> 4. This will give you your ending index.
> 5 slice the included string, taking into account the start is start +
> len("PROC SQL;") and the end is the ending index - the length of
> whichever string ended in your case
> 
> Regular expressions are powerful, but not so easy to read unless you
> are really into them.
> -- 
> Joel Goldstick
> http://joelgoldstick.com/blog
> http://cc-baseballstats.info/stats/birthdays

Hey Joel, not too sure if i get the idea of your code implementation
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Extract all words between two keywords in .txt file (Python)

2019-12-11 Thread A S
On Thursday, 12 December 2019 02:28:09 UTC+8, Ben Bacarisse  wrote:
> A S  writes:
> 
> > I would like to extract all words within specific keywords in a .txt
> > file. For the keywords, there is a starting keyword of "PROC SQL;" (I
> > need this to be case insensitive) and the ending keyword could be
> > either "RUN;", "quit;" or "QUIT;". This is my sample .txt file.
> >
> > Thus far, this is my code:
> >
> > with open('lan sample text file1.txt') as file:
> > text = file.read()
> > regex = re.compile(r'(PROC SQL;|proc sql;(.*?)RUN;|quit;|QUIT;)')
> > k = regex.findall(text)
> > print(k)
> 
> Try
> 
>   re.compile(r'(?si)(PROC SQL;.*(?:QUIT|RUN);)')
> 
> Read up one what (?si) means and what (?:...) means..  You can do the
> same by passing flags to the compile method.
> 
> > Output:
> >
> > [('quit;', ''), ('quit;', ''), ('PROC SQL;', '')]
> 
> Your main issue is that | binds weakly.  Your whole pattern tries to
> match any one of just four short sub-patterns:
> 
> PROC SQL;
> proc sql;(.*?)RUN;
> quit;
> QUIT;
> 
> -- 
> Ben.

Hey Ben, this works for my sample .txt file! Thanks:) but it wont work, if I 
have other multiple text files to parse through that, are similar but have some 
variations, strangely enough. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Extract all words between two keywords in .txt file (Python)

2019-12-11 Thread A S
I would like to extract all words within specific keywords in a .txt file. For 
the keywords, there is a starting keyword of "PROC SQL;" (I need this to be 
case insensitive) and the ending keyword could be either "RUN;", "quit;" or 
"QUIT;". This is my sample .txt file.

Thus far, this is my code:

with open('lan sample text file1.txt') as file:
text = file.read()
regex = re.compile(r'(PROC SQL;|proc sql;(.*?)RUN;|quit;|QUIT;)')
k = regex.findall(text)
print(k)


Output:

[('quit;', ''), ('quit;', ''), ('PROC SQL;', '')]

However, my intended output is to get the words in between and inclusive of the 
keywords:

proc sql; ("TRUuuuth");
hhhjhfjs as fdsjfsj:
select * from djfkjd to jfkjs
(
SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format '-xx') gff &jfjfsj_jfjfj.) and 
  (xxx(xx_ix as format '-xx') lec &jgjsd_vnv.))
 );

1)

jj;

  select xx("xE'", PUT(xx..),"'") jdfjhf:jhfjj from _x_xx_L ;
quit; 

PROC SQL; ("CUuuuth");
hhhjhfjs as fdsjfsj:
select * from djfkjd to jfkjs
(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format '-xx') gff &jfjfsj_jfjfj.) and 
  (xxx(xx_ix as format '-xx') lec &jgjsd_vnv.))(( ))
 );

2)(

RUN;


__
Any advice or different ways to go about this would be greatly appreciated!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Extract sentences in nested parentheses using Python

2019-12-05 Thread A S
On Tuesday, 3 December 2019 23:48:21 UTC+8, Peter Otten  wrote:
> A S wrote:
> 
> > On Tuesday, 3 December 2019 01:01:25 UTC+8, Peter Otten  wrote:
> >> A S wrote:
> >> 
> >> I think I've seen this question before ;)
> >> 
> >> > I am trying to extract all strings in nested parentheses (along with
> >> > the parentheses itself) in my .txt file. Please see the sample .txt
> >> > file that I have used in this example here:
> >> > (https://drive.google.com/open?id=1UKc0ZgY9Fsz5O1rSeBCLqt5dwZkMaQgr).
> >> > 
> >> > I have tried and done up three different codes but none of them seems
> >> > to be able to extract all the nested parentheses. They can only extract
> >> > a portion of the nested parentheses. Any advice on what I've done wrong
> >> > could really help!
> >> > 
> >> > Here are the three codes I have done so far:
> >> > 
> >> > 1st attempt:
> >> > 
> >> > import re
> >> > from os.path import join
> >> > 
> >> > def balanced_braces(args):
> >> > parts = []
> >> > for arg in args:
> >> > if '(' not in arg:
> >> > continue
> >> 
> >> There could still be a ")" that you miss
> >> 
> >> > chars = []
> >> > n = 0
> >> > for c in arg:
> >> > if c == '(':
> >> > if n > 0:
> >> > chars.append(c)
> >> > n += 1
> >> > elif c == ')':
> >> > n -= 1
> >> > if n > 0:
> >> > chars.append(c)
> >> > elif n == 0:
> >> > parts.append(''.join(chars).lstrip().rstrip())
> >> > chars = []
> >> > elif n > 0:
> >> > chars.append(c)
> >> > return parts
> >> 
> >> It's probably easier to understand and implement when you process the
> >> complete text at once. Then arbitrary splits don't get in the way of your
> >> quest for ( and ). You just have to remember the position of the first
> >> opening ( and number of opening parens that have to be closed before you
> >> take the complete expression:
> >> 
> >> level:  000100
> >> text:   abc(def(gh))ij
> >>when we are here^
> >> we need^
> >> 
> >> A tentative implementation:
> >> 
> >> $ cat parse.py
> >> import re
> >> 
> >> NOT_SET = object()
> >> 
> >> def scan(text):
> >> level = 0
> >> start = NOT_SET
> >> for m in re.compile("[()]").finditer(text):
> >> if m.group() == ")":
> >> level -= 1
> >> if level < 0:
> >> raise ValueError("underflow: more closing than opening
> >> parens")
> >> if level == 0:
> >> # outermost closing parenthesis:
> >> # deliver enclosed string including parens.
> >> yield text[start:m.end()]
> >> start = NOT_SET
> >> elif m.group() == "(":
> >> if level == 0:
> >> # outermost opening parenthesis: remember position.
> >> assert start is NOT_SET
> >> start = m.start()
> >> level += 1
> >> else:
> >> assert False
> >> if level > 0:
> >> raise ValueError("unclosed parens remain")
> >> 
> >> 
> >> if __name__ == "__main__":
> >> with open("lan sample text file.txt") as instream:
> >> text = instream.read()
> >> for chunk in scan(text):
> >> print(chunk)
> >> $ python3 parse.py
> >> ("xE'", PUT(xx..),"'")
> >> ("TRUuuuth")
> > 
> > Hello Peter! I tried this on my actual working files and it returned this
> > error: "unclosed parens remain". In this case, how can I continue to parse
> > through my text files by only extracting those with balanced parentheses
> > and ignore those that are incomplete?
> 
> filenames = ...
> for filename in filenames:
> with open(filename) as instream:
> text = instream.read()
> try:
> chunks = list(scan(text))
> except ValueError as err:
> print(f"{err} in file {filename!r}", file=sys.stderr)
> else:
>for chunk in chunks:
>print(chunk)

hey Peter, it works! Thank you :)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Extract sentences in nested parentheses using Python

2019-12-05 Thread A S
On Tuesday, 3 December 2019 06:22:52 UTC+8, DL Neil  wrote:
> On 3/12/19 6:00 AM, Peter Otten wrote:
> > A S wrote:
> > I think I've seen this question before ;)
> 
> In addition to 'other reasons' for @Peter's comment, it is a common 
> ComSc worked-problem or assignment. (in which case, we'd appreciate 
> being told that you/OP is asking for help with "homework")
> 
> 
> >> I am trying to extract all strings in nested parentheses (along with the
> >> parentheses itself) in my .txt file. Please see the sample .txt file that
> >> I have used in this example here:
> >> (https://drive.google.com/open?id=1UKc0ZgY9Fsz5O1rSeBCLqt5dwZkMaQgr).
> >>
> >> I have tried and done up three different codes but none of them seems to
> >> be able to extract all the nested parentheses. They can only extract a
> >> portion of the nested parentheses. Any advice on what I've done wrong
> >> could really help!
> 
> One approach is to research in the hope that there are already existing 
> tools or techniques which may help/save you from 'reinventing the wheel' 
> - when you think about it, a re-statement of open-source objectives.
> 
> How does the Python interpreter break-down Python (text) code into its 
> constituent parts ("tokens") *including* parentheses? Such are made 
> available in (perhaps) a lesser-known corner of the PSL (Python Standard 
> Library). Might you be able to use one such tool?
> 
> The ComSc technique which sprang to mind involves "stacks" (a LIFO data 
> structure) and "RPN" (Reverse Polish Notation). Whereas we like people 
> to take their turn when it comes to limited resources, eg to form a 
> "queue" to purchase/pay for goods at the local store, which is "FIFO" 
> (first-in, first-out); a "stack"/LIFO (last-in, first-out) can be 
> problematic to put into practical application. There are plenty of 
> Python implementations or you can 'roll your own' with a list. Again, 
> I'd likely employ a "deque" from the PSL's Collections library (although 
> as a "stack" rather than as a "double-ended queue"), because the 
> optimisation comes "free". (to my laziness, but after some kind soul 
> sweated-bullets to make it fast (in both senses) for 'the rest of us'!)
> 
> 
> > It's probably easier to understand and implement when you process the
> > complete text at once. Then arbitrary splits don't get in the way of your
> > quest for ( and ). You just have to remember the position of the first
> > opening ( and number of opening parens that have to be closed before you
> > take the complete expression:
> 
> +1
> but as a 'silver surfer', I don't like to be asked to "remember" anything!
> 
> 
> > level:  000100
> >  we need^
> 
> 
> Consider:
> original_text (the contents of the .txt file - add buffering if volumes 
> are huge)
> current_text (the characters we have processed/"recognised" so-far)
> stack (what an original name for such a data-structure! Which will 
> contain each of the partial parenthetical expressions found - but yet to 
> be proven/made complete)
> 
> set current_text to NULSTRING
> for each current_character in original_text:
>   if current_character is LEFT_PARENTHESIS:
>   push current_text to stack
>   set current_text to LEFT_PARENTHESIS
>   concatenate current_character with current_text
>   if current_character is RIGHT_PARENTHESIS:
>   # current_text is a parenthetical expression
>   # do with it what you will
>   pop the stack
>   set current_text to the ex-stack string \
>   concat current_text's p-expn
> 
> Once working: cover 'special cases' (after above loop), eg original_text 
> which doesn't begin and/or end with parentheses; and error cases, eg 
> pop-ping a NULSTRING, or thinking things are finished but the stack is 
> not yet empty - likely events from unbalanced parentheses!
> 
> original text = "abc(def(gh))ij"
> 
> event 1: in-turn, concatenate characters "abc" as current_text
> event 2: locate (first) left-parenthesis, push current_text to stack(&)
> event 3: concatenate "(def"
> event 4: push, likewise
> event 5: concatenate "(gh"
> event 6: locate (first) right-parenthesis (matches to left-parenthesis 
> begining the current_string!)
> result?: ?print current_text?
> event 7: pop stack and redefine current_text as &

Re: Extract sentences in nested parentheses using Python

2019-12-03 Thread A S
On Tuesday, 3 December 2019 01:01:25 UTC+8, Peter Otten  wrote:
> A S wrote:
> 
> I think I've seen this question before ;)
> 
> > I am trying to extract all strings in nested parentheses (along with the
> > parentheses itself) in my .txt file. Please see the sample .txt file that
> > I have used in this example here:
> > (https://drive.google.com/open?id=1UKc0ZgY9Fsz5O1rSeBCLqt5dwZkMaQgr).
> > 
> > I have tried and done up three different codes but none of them seems to
> > be able to extract all the nested parentheses. They can only extract a
> > portion of the nested parentheses. Any advice on what I've done wrong
> > could really help!
> > 
> > Here are the three codes I have done so far:
> > 
> > 1st attempt:
> > 
> > import re
> > from os.path import join
> > 
> > def balanced_braces(args):
> > parts = []
> > for arg in args:
> > if '(' not in arg:
> > continue
> 
> There could still be a ")" that you miss
> 
> > chars = []
> > n = 0
> > for c in arg:
> > if c == '(':
> > if n > 0:
> > chars.append(c)
> > n += 1
> > elif c == ')':
> > n -= 1
> > if n > 0:
> > chars.append(c)
> > elif n == 0:
> > parts.append(''.join(chars).lstrip().rstrip())
> > chars = []
> > elif n > 0:
> > chars.append(c)
> > return parts
> 
> It's probably easier to understand and implement when you process the 
> complete text at once. Then arbitrary splits don't get in the way of your 
> quest for ( and ). You just have to remember the position of the first 
> opening ( and number of opening parens that have to be closed before you 
> take the complete expression:
> 
> level:  000100
> text:   abc(def(gh))ij
>when we are here^
> we need^
> 
> A tentative implementation:
> 
> $ cat parse.py
> import re
> 
> NOT_SET = object()
> 
> def scan(text):
> level = 0
> start = NOT_SET
> for m in re.compile("[()]").finditer(text):
> if m.group() == ")":
> level -= 1
> if level < 0:
> raise ValueError("underflow: more closing than opening 
> parens")
> if level == 0:
> # outermost closing parenthesis:
> # deliver enclosed string including parens.
> yield text[start:m.end()]
> start = NOT_SET
> elif m.group() == "(":
> if level == 0:
> # outermost opening parenthesis: remember position.
> assert start is NOT_SET
> start = m.start()
> level += 1
> else:
> assert False
> if level > 0:
> raise ValueError("unclosed parens remain")
> 
> 
> if __name__ == "__main__":
> with open("lan sample text file.txt") as instream:
> text = instream.read()
> for chunk in scan(text):
> print(chunk)
> $ python3 parse.py
> ("xE'", PUT(xx..),"'")
> ("TRUuuuth")

Hello Peter! I tried this on my actual working files and it returned this 
error: "unclosed parens remain". In this case, how can I continue to parse 
through my text files by only extracting those with balanced parentheses and 
ignore those that are incomplete?
-- 
https://mail.python.org/mailman/listinfo/python-list


Extract sentences in nested parentheses using Python

2019-12-02 Thread A S
I am trying to extract all strings in nested parentheses (along with the 
parentheses itself) in my .txt file. Please see the sample .txt file that I 
have used in this example here: 
(https://drive.google.com/open?id=1UKc0ZgY9Fsz5O1rSeBCLqt5dwZkMaQgr).

I have tried and done up three different codes but none of them seems to be 
able to extract all the nested parentheses. They can only extract a portion of 
the nested parentheses. Any advice on what I've done wrong could really help!

Here are the three codes I have done so far:

1st attempt:

import re
from os.path import join

def balanced_braces(args):
parts = []
for arg in args:
if '(' not in arg:
continue
chars = []
n = 0
for c in arg:
if c == '(':
if n > 0:
chars.append(c)
n += 1
elif c == ')':
n -= 1
if n > 0:
chars.append(c)
elif n == 0:
parts.append(''.join(chars).lstrip().rstrip())
chars = []
elif n > 0:
chars.append(c)
return parts

with open('lan sample text file.txt','r') as fd:
#for words in fd.readlines():   
t1 = balanced_braces(fd);
print(t1)


Output:

['"xE\'", PUT(xx..),"\'"', '"TRUuuuth"', "xxx(xx_ix as format '-xx') 
lec &jgjsd_vnv.", '"xE\'", PUT(xx..),"\'"', '"CUuuuth"', "xxx(xx_ix as 
format '-xx') lec &jgjsd_vnv."]



2nd attempt:

from pyparsing import nestedExpr

matchedParens = nestedExpr('(',')')
with open('lan sample text file.txt','r') as fd:
for words in fd.readlines():
for e in matchedParens.searchString(words):
print(e)


Output:

[['"xE\'"', ',', 'PUT', ['xx..'], ',', '"\'"']]
[['"TRUuuuth"']]
[['xxx', ['xx_ix', 'as', 'format', "'-xx'"], 'gff', '&jfjfsj_jfjfj.']]
[['xxx', ['xx_ix', 'as', 'format', "'-xx'"], 'lec', '&jgjsd_vnv.']]
[['"xE\'"', ',', 'PUT', ['xx..'], ',', '"\'"']]
[['"CUuuuth"']]
[['xxx', ['xx_ix', 'as', 'format', "'-xx'"], 'gff', '&jfjfsj_jfjfj.']]
[['xxx', ['xx_ix', 'as', 'format', "'-xx'"], 'lec', '&jgjsd_vnv.']]



3rd attempt:

def parse_segments(source, recurse=False):

unmatched_count = 0
start_pos = 0
opened = False
open_pos = 0
cur_pos = 0

finished = []
segments = []

for character in source:
#scan for mismatched parenthesis:
if character == '(':
unmatched_count += 1
if not opened:
open_pos = cur_pos
opened = True

if character == ')':
unmatched_count -= 1

if opened and unmatched_count == 0:
segment = source[open_pos:cur_pos+1]
segments.append(segment)
clean = source[start_pos:open_pos]
if clean:
finished.append(clean)
opened = False
start_pos = cur_pos+1

cur_pos += 1

   # assert unmatched_count == 0

if start_pos != cur_pos:
#get anything that was left over here
finished.append(source[start_pos:cur_pos])

#now check on recursion:
for item in segments:
#get rid of bounding parentheses:
pruned = item[1:-1]
if recurse:
results = parse_tags(pruned, recurse)
finished.expand(results)
else:
finished.append(pruned)

return finished

with open('lan sample text file.txt','r') as fd:
for words in fd.readlines():
t = parse_segments(words)
print(t)


Output:

['k;\n']
['\n']
['  select xx', ' jdfjhf:jhfjj from _x_xx_L ;\n', '"xE\'", 
PUT(xx..),"\'"']
['quit; \n']
['\n']
['/* 1.x FROM _x_Ex_x */ \n']
['proc sql; ', ';\n', '"TRUuuuth"']
['hhhjhfjs as fdsjfsj:\n']
['select * from djfkjd to jfkjs\n']
['(\n']
['SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, 
fjkdsf_jfkj\n']
['\tFROM &xxx..xxx_xxx_xxE\n']
["where ((xxx(xx_ix as format '-xx') gff &jfjfsj_jfjfj.) and \n"]
['  ', ')\n', "xxx(xx_ix as format '-xx') lec &jgjsd_vnv."]
[' );\n']
['\n']
['\n']
['jj;\n']
['\n']
['  select xx', ' jdfjhf:jhfjj from _x_xx_L ;\n', '"xE\'", 
PUT(xx..),"\'"']
['quit; \n']
['\n']
['/* 1.x FROM _x_Ex_x */ \n']
['proc sql; ', ';\n', '"CUuuuth"']
['hhhjhfjs as fdsjfsj:\n']
['select * from djfkjd to jfkjs\n']
['(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, 
fjkdsf_jfkj\n']
['\tFROM &xxx..xxx_xxx_xxE\n']
["where ((xxx(xx_ix as format '-xx') gff &jfjfsj_jfjfj.) and \n"]
['  ', ')\n', "xxx(xx_ix as format '-xx') lec &jgjsd_vnv."]
[' );']




My intended Output that I am unable to get should look something like this:


("xE'", PUT(xx..),"'")
("TRUuuuth")
(
SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format '-xx') gff &jfjfsj_jfjfj.) and
  (xxx(xx_ix a

Finding lines in .txt file that contain keywords from two different set()

2019-09-08 Thread A S
My problem is seemingly profound but I hope to make it sound as simplified as 
possible.Let me unpack the details..:

1. I have one folder of Excel (.xlsx) files that serve as a data dictionary.

-In Cell A1, the data source name is written in between brackets

-In Cols C:D, it contains the data field names (It could be in either col C or 
D in my actual Excel sheet. So I had to search both columns

-*Important: I need to know which data source the field names come from

2. I have another folder of Text (.txt) files that I need to parse through to 
find these keywords.

These are the folders used for a better reference ( 
https://drive.google.com/open?id=1_LcceqcDhHnWW3Nrnwf5RkXPcnDfesq ). The files 
are found in the folder.

This is the code I have thus far...:

import os, sys
from os.path
import join
import re
import xlrd
from xlrd import open_workbook
import openpyxl
from openpyxl.reader.excel import load_workbook
import xlsxwriter


#All the paths
dict_folder = 'C:/Users//Documents//Test Excel' 
text_folder = 'C:/Users//Documents//Text'

words = set()
fieldset = set()
for file in os.listdir(dict_folder):
if file.endswith(".xlsx"):
wb1 = load_workbook(join(dict_folder, file), data_only = True)
ws = wb1.active
   #Here I am reading and printing all the data source names set(words) in the 
excel dictionaries:
cellvalues = ws["A1"].value
wordsextract = re.findall(r"\((.+?)\)", str(cellvalues))
results = wordsextract[0]
words.add(results)
print(results)

for rowofcellobj in ws["C" : "D"]:
for cellobj in rowofcellobj:
   #2. Here I am printing all the field names in col C & D in the excel 
dictionaries:
data = re.findall(r"\w+_.*?\w+", str(cellobj.value))
if data != []:
fields = data[0]
fieldset.add(fields)
print(fieldset)
#listing = str.remove("")
#print(listing)   


#Here I am reading the name of each .txt file to the separate .xlsx file:
for r, name in enumerate(os.listdir(text_folder)):
if name.endswith(".txt"):
print(name)

#Reading .txt file and trying to make the sentence into words instead of lines 
so that I can compare the individual .txt file words with the .xlsx file 
txtfilespath = os.chdir("C:/Users//Documents//Text")


#Here I am reading and printing all the words in the .txt files and compare 
with the excel Cell A1:
for name in os.listdir(txtfilespath):
if name.endswith(".txt"):
with open (name, "r") as texts:
# Read each line of the file:
s = texts.read()
print(s)


#if .txt files contain.() or select or from or words from 
sets..search that sentence and extract the common fields

result1 = []
parens = 0
buff = ""
for line in s:
if line == "(":
parens += 1
if parens > 0:
buff += line
if line == ")":
parens -= 1
   if not parens and buff:
result1.append(buff)
buff = ""
set(result1)

#Here, I include other keywords other than those found in the Excel workbooks 
   checkhere = set()   
   checkhere.add("Select")
   checkhere.add("From")
   checkhere.add("select")
   checkhere.add("from")
   checkhere.add("SELECT")
   checkhere.add("FROM")
   # k = list(checkhere)
   # print(k)  

   #I only want to read/ extract the lines containing brackets () as well as 
the keywords in the checkhere set. So that I can check capture the source and 
field in each line:
   #I tried this but nothing was printed..
   for element in checkhere:
   if element in result1:
print(result1)


My desired output for the code that could not be printed when I tried is:

(/* 1.select_no., bi FROM apple_x_Ex_x */ 
 proc sql; "TRUuuuth")

(/* 1.x FROM x*/ 
proc sql; "TRUuuuth")

(SELECT abc AS abc1, ab33_2_ AS mon, a_rr, iirir_vf, jk_ff, sfa_jfkj
FROM &orange..xxx_xxx_xxE
 where (asre(kkk_ix as format '-xx') gff &bcbcb_hhaha.) and 
  (axx(xx_ix as format '-xx') lec &jgjsd_vnv.)
 )

 (/* 1.select_no. FROM apple_x_Ex_x */ 
 proc sql; "TRUuuuth")

 (SELECT abc AS kf, mcfg_2_ AS dokn, b_rr, jjhj_vf, jjjk_hj, fjjh_jhjkj
FROM &bfbd..pear_xxx_xxE
 where (afdfe(kkffk_ix as format 'd-xx') gdaff &bcdadabcb_hdahaha.) and 
  (axx(xx_ix as format '-xx') lec &jgjsdfdf_vnv.)
 )



After which, if I'm able to get the desired output above, I will then compare 
these lines against the word set() and the fieldset set().

Any help would really be appreciated here..thank you
-- 
https://mail.python.org/mailman/listinfo/python-list


How to only read words within brackets/ parentheses (in .txt file) using Python

2019-09-04 Thread A S
I understand that reading lines in .txt files would look something like this in 
Python:


with open('filename','r') as fd:
   lines = fd.readlines()


However, how do I run my code to only read the words in my .txt files that are 
within each balanced parenthesis?

I am not sure how to go about it, let's say my .txt file contents lines like 
this:

k;

select xx("xE'", PUT(xx..),"'") jdfjhf:jhfjj from _x_xx_L ;
quit; 

/* 1.x FROM _x_Ex_x */ 
proc sql; "TRUuuuth");
hhhjhfjs as fdsjfsj:
select * from djfkjd to jfkjs
(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
FROM &xxx..xxx_xxx_xxE
   where (xxx(xx_ix as format '-xx') gff &jfjfsj_jfjfj.) and 
  (xxx(xx_ix as format '-xx') lec &jgjsd_vnv.)
);


The main idea is to read only these portions of the .txt file (i.e. Those 
within parentheses):

 ("xE'", PUT(xx..),"'") jdfjhf:jhfjj from _x_xx_L ;
quit; 

/* 1.x FROM _x_Ex_x */ 
proc sql; "TRUuuuth")

(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
FROM &xxx..xxx_xxx_xxE
   where (xxx(xx_ix as format '-xx') gff &jfjfsj_jfjfj.) and 
  (xxx(xx_ix as format '-xx') lec &jgjsd_vnv.)
)



Any help will be truly appreciated
-- 
https://mail.python.org/mailman/listinfo/python-list


Calling a matlab script from python

2007-09-05 Thread n o s p a m p l e a s e
Suppose I have a matlab script mymatlab.m. How can I call this script
from a python script?

Thanx/NSP

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Calling a dos batch file from python

2007-09-05 Thread n o s p a m p l e a s e
On Sep 4, 5:01 pm, [EMAIL PROTECTED] wrote:
> On Sep 4, 8:42 am, n o s p a m p l e a s e <[EMAIL PROTECTED]>
> wrote:
>
> > Suppose I have a batch file called mybatch.bat  and I want to run it
> > from a python script. How can I call this batch file in python script?
>
> > Thanx/NSP
>
> The subprocess module should work.
>
Thanx to all those who responded. It was quite simple.

import os
os.system("mybatch.bat")

NSP

-- 
http://mail.python.org/mailman/listinfo/python-list


Calling a dos batch file from python

2007-09-04 Thread n o s p a m p l e a s e
Suppose I have a batch file called mybatch.bat  and I want to run it
from a python script. How can I call this batch file in python script?

Thanx/NSP

-- 
http://mail.python.org/mailman/listinfo/python-list